Implementation:Triton inference server Server L0 Vertex Ai Test

L0 Vertex AI Test

Source File: qa/L0_vertex_ai/test.sh
Language: Bash (722 lines)
Domains: Testing, Cloud_Integration

Purpose

This QA test shell script validates Triton Inference Server's compatibility with Google Cloud Vertex AI endpoint conventions. It tests the Vertex AI HTTP service lifecycle, AIP environment variable configuration, health and predict endpoint routing, default model selection for multi-model repositories, the X-Vertex-Ai-Triton-Redirect header for accessing Triton-native endpoints, and interaction with AIP_STORAGE_URI and AIP_HTTP_PORT.

Signature

#!/bin/bash
# Primary entry point: test.sh [REPO_VERSION]
# Environment variables:
#   NVIDIA_TRITON_SERVER_VERSION - Repository version
#   CUDA_VISIBLE_DEVICES - GPU device selection (set to 0)
#   AIP_MODE, AIP_HTTP_PORT, AIP_HEALTH_ROUTE, AIP_PREDICT_ROUTE, AIP_STORAGE_URI
#
# Key functions:
#   vertex_ai_wait_for_server_ready(pid, timeout) - Poll health endpoint
#   unset_vertex_variables()                       - Clear all AIP_ env vars

Key Components

Default allow-vertex-ai Behavior

Tests the --allow-vertex-ai flag behavior:

Default (false): Vertex AI service is not started, no related log messages
Explicit enable: Vertex AI HTTP service is started
With AIP_MODE=PREDICTION: Vertex AI is auto-enabled, HTTP endpoint is disabled, GRPC can be separately enabled
Explicit disable: Only standard HTTP service runs

# Default false - no Vertex AI
SERVER_ARGS=${BASE_SERVER_ARGS}
# Explicit enable
SERVER_ARGS="${BASE_SERVER_ARGS} --allow-vertex-ai=true"
# AIP_MODE triggers auto-enable
export AIP_MODE=PREDICTION
# Explicit disable overrides AIP_MODE
SERVER_ARGS="${BASE_SERVER_ARGS} --allow-vertex-ai=false --allow-http=true"

Missing Route Validation

Tests that the server fails to start with clear error messages when required AIP route variables are not set:

Missing AIP_PREDICT_ROUTE: "API_PREDICT_ROUTE is not defined for Vertex AI endpoint"
Missing AIP_HEALTH_ROUTE: "AIP_HEALTH_ROUTE is not defined for Vertex AI endpoint"

Endpoint Functionality

Tests health and predict endpoints using a single-model repository. The health endpoint uses vertex_ai_wait_for_server_ready() which polls the configured AIP_HEALTH_ROUTE. Prediction uses the Python test script vertex_ai_test.py with 8 unit tests.

AIP_STORAGE_URI and AIP_HTTP_PORT

Tests that AIP_STORAGE_URI can specify the model repository path and AIP_HTTP_PORT can override the default port (8080). Both are verified through inference tests on a custom port.

export AIP_STORAGE_URI=single_model
export AIP_HTTP_PORT=5234
SERVER_ARGS="--allow-vertex-ai=true"

Default Model Configuration

Tests several default model scenarios:

Error when specified default model is not found in the repository
Error when multi-model repository has no default model specified
AIP_STORAGE_URI is ignored when --model-repository is explicitly provided
Correct operation with default model specified for multi-model repository

# Error: default model not found
SERVER_ARGS="--vertex-ai-default-model=subadd"
# Error: multi-model with no default
export AIP_STORAGE_URI=multi_models
SERVER_ARGS=""
# Success: default model specified
SERVER_ARGS="--vertex-ai-default-model=addsub"

X-Vertex-Ai-Triton-Redirect

Tests the redirect header mechanism that allows accessing Triton-native endpoints through the Vertex AI predict route. Validates the following redirections:

Redirect Target	Description
`metrics`	Prometheus metrics (checks for `nv_inference_request_success`)
`v2/models/stats`	All model statistics
`v2/models/subadd/stats`	Single model statistics
`v2/health/live`	Server liveness check
`v2/models/addsub/ready`	Model readiness check
`v2`	Server metadata (checks for `extensions`)
`v2/models/addsub`	Model metadata (checks for `platform`)
`v2/models/addsub/config`	Model configuration (checks for `version_policy`)
`v2/systemsharedmemory/status`	System shared memory status
`v2/cudasharedmemory/status`	CUDA shared memory status
`v2/repository/index`	Repository index (both models listed)
`v2/repository/models/subadd/unload`	Model control (expects error: "explicit model load / unload is not allowed")

Test Flow

Test default allow-vertex-ai flag behavior
Test missing route variable error handling
Test health and predict endpoints with single model
Test AIP_STORAGE_URI and AIP_HTTP_PORT overrides
Test default model configuration scenarios
Test X-Vertex-Ai-Triton-Redirect header across all endpoints

Dependencies

vertex_ai_test.py - Python test script (8 unit tests)
../common/util.sh - Common test utility functions
ONNX models from qa_model_repository

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment