Implementation:Triton inference server Server L0 Vertex Ai Test
L0 Vertex AI Test
Source File: qa/L0_vertex_ai/test.sh
Language: Bash (722 lines)
Domains: Testing, Cloud_Integration
Purpose
This QA test shell script validates Triton Inference Server's compatibility with Google Cloud Vertex AI endpoint conventions. It tests the Vertex AI HTTP service lifecycle, AIP environment variable configuration, health and predict endpoint routing, default model selection for multi-model repositories, the X-Vertex-Ai-Triton-Redirect header for accessing Triton-native endpoints, and interaction with AIP_STORAGE_URI and AIP_HTTP_PORT.
Signature
#!/bin/bash
# Primary entry point: test.sh [REPO_VERSION]
# Environment variables:
# NVIDIA_TRITON_SERVER_VERSION - Repository version
# CUDA_VISIBLE_DEVICES - GPU device selection (set to 0)
# AIP_MODE, AIP_HTTP_PORT, AIP_HEALTH_ROUTE, AIP_PREDICT_ROUTE, AIP_STORAGE_URI
#
# Key functions:
# vertex_ai_wait_for_server_ready(pid, timeout) - Poll health endpoint
# unset_vertex_variables() - Clear all AIP_ env vars
Key Components
Default allow-vertex-ai Behavior
Tests the --allow-vertex-ai flag behavior:
- Default (false): Vertex AI service is not started, no related log messages
- Explicit enable: Vertex AI HTTP service is started
- With
AIP_MODE=PREDICTION: Vertex AI is auto-enabled, HTTP endpoint is disabled, GRPC can be separately enabled - Explicit disable: Only standard HTTP service runs
# Default false - no Vertex AI
SERVER_ARGS=${BASE_SERVER_ARGS}
# Explicit enable
SERVER_ARGS="${BASE_SERVER_ARGS} --allow-vertex-ai=true"
# AIP_MODE triggers auto-enable
export AIP_MODE=PREDICTION
# Explicit disable overrides AIP_MODE
SERVER_ARGS="${BASE_SERVER_ARGS} --allow-vertex-ai=false --allow-http=true"
Missing Route Validation
Tests that the server fails to start with clear error messages when required AIP route variables are not set:
- Missing
AIP_PREDICT_ROUTE: "API_PREDICT_ROUTE is not defined for Vertex AI endpoint" - Missing
AIP_HEALTH_ROUTE: "AIP_HEALTH_ROUTE is not defined for Vertex AI endpoint"
Endpoint Functionality
Tests health and predict endpoints using a single-model repository. The health endpoint uses vertex_ai_wait_for_server_ready() which polls the configured AIP_HEALTH_ROUTE. Prediction uses the Python test script vertex_ai_test.py with 8 unit tests.
AIP_STORAGE_URI and AIP_HTTP_PORT
Tests that AIP_STORAGE_URI can specify the model repository path and AIP_HTTP_PORT can override the default port (8080). Both are verified through inference tests on a custom port.
export AIP_STORAGE_URI=single_model
export AIP_HTTP_PORT=5234
SERVER_ARGS="--allow-vertex-ai=true"
Default Model Configuration
Tests several default model scenarios:
- Error when specified default model is not found in the repository
- Error when multi-model repository has no default model specified
AIP_STORAGE_URIis ignored when--model-repositoryis explicitly provided- Correct operation with default model specified for multi-model repository
# Error: default model not found
SERVER_ARGS="--vertex-ai-default-model=subadd"
# Error: multi-model with no default
export AIP_STORAGE_URI=multi_models
SERVER_ARGS=""
# Success: default model specified
SERVER_ARGS="--vertex-ai-default-model=addsub"
X-Vertex-Ai-Triton-Redirect
Tests the redirect header mechanism that allows accessing Triton-native endpoints through the Vertex AI predict route. Validates the following redirections:
| Redirect Target | Description |
|---|---|
metrics |
Prometheus metrics (checks for nv_inference_request_success)
|
v2/models/stats |
All model statistics |
v2/models/subadd/stats |
Single model statistics |
v2/health/live |
Server liveness check |
v2/models/addsub/ready |
Model readiness check |
v2 |
Server metadata (checks for extensions)
|
v2/models/addsub |
Model metadata (checks for platform)
|
v2/models/addsub/config |
Model configuration (checks for version_policy)
|
v2/systemsharedmemory/status |
System shared memory status |
v2/cudasharedmemory/status |
CUDA shared memory status |
v2/repository/index |
Repository index (both models listed) |
v2/repository/models/subadd/unload |
Model control (expects error: "explicit model load / unload is not allowed") |
Test Flow
- Test default allow-vertex-ai flag behavior
- Test missing route variable error handling
- Test health and predict endpoints with single model
- Test AIP_STORAGE_URI and AIP_HTTP_PORT overrides
- Test default model configuration scenarios
- Test X-Vertex-Ai-Triton-Redirect header across all endpoints
Dependencies
vertex_ai_test.py- Python test script (8 unit tests)../common/util.sh- Common test utility functions- ONNX models from
qa_model_repository