Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Triton inference server Server L0 Vertex Ai Test

From Leeroopedia


L0 Vertex AI Test

Source File: qa/L0_vertex_ai/test.sh
Language: Bash (722 lines)
Domains: Testing, Cloud_Integration

Purpose

This QA test shell script validates Triton Inference Server's compatibility with Google Cloud Vertex AI endpoint conventions. It tests the Vertex AI HTTP service lifecycle, AIP environment variable configuration, health and predict endpoint routing, default model selection for multi-model repositories, the X-Vertex-Ai-Triton-Redirect header for accessing Triton-native endpoints, and interaction with AIP_STORAGE_URI and AIP_HTTP_PORT.

Signature

#!/bin/bash
# Primary entry point: test.sh [REPO_VERSION]
# Environment variables:
#   NVIDIA_TRITON_SERVER_VERSION - Repository version
#   CUDA_VISIBLE_DEVICES - GPU device selection (set to 0)
#   AIP_MODE, AIP_HTTP_PORT, AIP_HEALTH_ROUTE, AIP_PREDICT_ROUTE, AIP_STORAGE_URI
#
# Key functions:
#   vertex_ai_wait_for_server_ready(pid, timeout) - Poll health endpoint
#   unset_vertex_variables()                       - Clear all AIP_ env vars

Key Components

Default allow-vertex-ai Behavior

Tests the --allow-vertex-ai flag behavior:

  • Default (false): Vertex AI service is not started, no related log messages
  • Explicit enable: Vertex AI HTTP service is started
  • With AIP_MODE=PREDICTION: Vertex AI is auto-enabled, HTTP endpoint is disabled, GRPC can be separately enabled
  • Explicit disable: Only standard HTTP service runs
# Default false - no Vertex AI
SERVER_ARGS=${BASE_SERVER_ARGS}
# Explicit enable
SERVER_ARGS="${BASE_SERVER_ARGS} --allow-vertex-ai=true"
# AIP_MODE triggers auto-enable
export AIP_MODE=PREDICTION
# Explicit disable overrides AIP_MODE
SERVER_ARGS="${BASE_SERVER_ARGS} --allow-vertex-ai=false --allow-http=true"

Missing Route Validation

Tests that the server fails to start with clear error messages when required AIP route variables are not set:

  • Missing AIP_PREDICT_ROUTE: "API_PREDICT_ROUTE is not defined for Vertex AI endpoint"
  • Missing AIP_HEALTH_ROUTE: "AIP_HEALTH_ROUTE is not defined for Vertex AI endpoint"

Endpoint Functionality

Tests health and predict endpoints using a single-model repository. The health endpoint uses vertex_ai_wait_for_server_ready() which polls the configured AIP_HEALTH_ROUTE. Prediction uses the Python test script vertex_ai_test.py with 8 unit tests.

AIP_STORAGE_URI and AIP_HTTP_PORT

Tests that AIP_STORAGE_URI can specify the model repository path and AIP_HTTP_PORT can override the default port (8080). Both are verified through inference tests on a custom port.

export AIP_STORAGE_URI=single_model
export AIP_HTTP_PORT=5234
SERVER_ARGS="--allow-vertex-ai=true"

Default Model Configuration

Tests several default model scenarios:

  • Error when specified default model is not found in the repository
  • Error when multi-model repository has no default model specified
  • AIP_STORAGE_URI is ignored when --model-repository is explicitly provided
  • Correct operation with default model specified for multi-model repository
# Error: default model not found
SERVER_ARGS="--vertex-ai-default-model=subadd"
# Error: multi-model with no default
export AIP_STORAGE_URI=multi_models
SERVER_ARGS=""
# Success: default model specified
SERVER_ARGS="--vertex-ai-default-model=addsub"

X-Vertex-Ai-Triton-Redirect

Tests the redirect header mechanism that allows accessing Triton-native endpoints through the Vertex AI predict route. Validates the following redirections:

Redirect Target Description
metrics Prometheus metrics (checks for nv_inference_request_success)
v2/models/stats All model statistics
v2/models/subadd/stats Single model statistics
v2/health/live Server liveness check
v2/models/addsub/ready Model readiness check
v2 Server metadata (checks for extensions)
v2/models/addsub Model metadata (checks for platform)
v2/models/addsub/config Model configuration (checks for version_policy)
v2/systemsharedmemory/status System shared memory status
v2/cudasharedmemory/status CUDA shared memory status
v2/repository/index Repository index (both models listed)
v2/repository/models/subadd/unload Model control (expects error: "explicit model load / unload is not allowed")

Test Flow

  1. Test default allow-vertex-ai flag behavior
  2. Test missing route variable error handling
  3. Test health and predict endpoints with single model
  4. Test AIP_STORAGE_URI and AIP_HTTP_PORT overrides
  5. Test default model configuration scenarios
  6. Test X-Vertex-Ai-Triton-Redirect header across all endpoints

Dependencies

  • vertex_ai_test.py - Python test script (8 unit tests)
  • ../common/util.sh - Common test utility functions
  • ONNX models from qa_model_repository

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment