Implementation:Triton inference server Server L0 Trace Test
L0 Trace Test
Source File: qa/L0_trace/test.sh
Language: Bash (1349 lines)
Domains: Testing, Tracing
Purpose
This comprehensive QA test shell script validates Triton Inference Server's tracing functionality across both file-based (Triton native) and OpenTelemetry tracing modes. It exercises the trace settings API for global and per-model configuration, trace rate limiting, trace count management, log frequency rotation, custom backend tracing, BLS/ensemble trace propagation, BatchSpanProcessor parameter tuning, Python backend trace context exposure, and long-running stress tests for trace stability.
Signature
#!/bin/bash
# Primary entry point: test.sh [REPO_VERSION]
# Environment variables:
# NVIDIA_TRITON_SERVER_VERSION - Repository version
# CUDA_VISIBLE_DEVICES - GPU device selection (set to 0)
#
# Key functions:
# assert_curl_success() - Assert HTTP 200 response
# assert_curl_failure() - Assert non-200 response
# get_global_trace_setting() - GET /v2/trace/setting
# get_trace_setting(model) - GET /v2/models/{model}/trace/setting
# update_global_trace_setting(json) - POST /v2/trace/setting
# update_trace_setting(model, json) - POST /v2/models/{model}/trace/setting
# send_inference_requests(log, n) - Send n HTTP+gRPC inference request pairs
# check_pbe_trace_context(model, b) - Check Python backend trace context
# run_stress_client(client, log) - Run trace stress client for 120 seconds
Key Components
Model Setup
The script prepares a diverse set of models including simple, global_simple, ensemble_add_sub_int32_int32_int32, bls_simple, repeat_int32, custom_identity_int32, identity_fp32, dynamic_batch, input_all_required, and trace_context. These cover basic inference, ensemble pipelines, BLS chains, custom backends, and decoupled models.
Trace OFF to TIMESTAMPS
Tests dynamic trace activation by starting the server with tracing disabled (level=OFF), then enabling it via the trace API to TIMESTAMPS level. Validates that only requests after API activation are traced.
SERVER_ARGS="--trace-config triton,file=trace_off_to_min.log --trace-config level=OFF --trace-config rate=1 ..."
run_server
update_global_trace_setting '{"trace_level":["TIMESTAMPS"]}'
send_inference_requests "client_min.log" 10
# Expect 20 traced requests (10 HTTP + 10 gRPC)
Per-Model Trace Settings
Demonstrates per-model trace configuration through the API. Tests that model-specific settings override global settings, log frequency rotation creates indexed trace files, and clearing model settings reverts to global defaults.
update_trace_setting "simple" '{"log_frequency":"2"}'
update_trace_setting "simple" '{"trace_level":["OFF"]}'
update_trace_setting "simple" '{"trace_level":null}' # Clear to revert to global
Trace Count Management
Verifies the trace_count mechanism that limits the total number of traced requests. After the count is exhausted, the setting transitions to 0 and generates an indexed log file. Also validates out-of-range error handling.
BLS and Ensemble Tracing
Tests trace propagation through BLS (Business Logic Scripting) and ensemble model chains. Validates that parent-child trace relationships are correctly maintained with proper parent_id fields across the execution pipeline.
Custom Backend Tracing
Validates that custom backend trace activities (CUSTOM_SINGLE_ACTIVITY, CUSTOM_ACTIVITY_START, CUSTOM_ACTIVITY_END) are properly recorded when the identity backend has enable_custom_tracing enabled.
Non-Existent Model Handling
Tests that the trace API returns errors for non-existent models and unloaded models, and correctly handles model load/unload/reload cycles.
OpenTelemetry Integration
Installs the OpenTelemetry Collector (v0.91.0) and runs the opentelemetry_unittest.py test suite with the server configured in mode=opentelemetry. Tests include resource attribute propagation, SageMaker endpoint compatibility, and Python backend trace context.
SERVER_ARGS="... --trace-config=mode=opentelemetry \
--trace-config=opentelemetry,resource=test.key=test.value \
--trace-config=opentelemetry,resource=service.name=test_triton \
--trace-config=opentelemetry,url=localhost:$OTLP_PORT/v1/traces ..."
OTel Workaround with Rate 0
Tests that with rate=0, no traces are collected for normal requests, but requests carrying an OTel context header (traceparent) are still traced.
OTel Count Expiry
Validates that after trace count expires, only requests with explicit OTel context headers continue to be traced, while regular requests are not.
BatchSpanProcessor Configuration
Tests three BSP parameters:
bsp_max_queue_size=1- Verifies spans are dropped with a warningbsp_schedule_delay=0- Verifies multiple batches are exportedbsp_max_export_batch_size=1- Verifies exactly 6 scopeSpans entries
Python Backend Trace Context
Tests that the Python backend receives None trace context in Triton trace mode and when tracing is OFF, confirming proper context isolation.
Long-Running Stress Tests
Runs 120-second stress tests for both Triton and OpenTelemetry trace modes to verify server stability under continuous traced inference load.
Test Flow
- Set up model repository with diverse model types
- Test dynamic trace activation (OFF -> TIMESTAMPS)
- Test per-model trace settings and overrides
- Test trace count limits and file rotation
- Test BLS/ensemble trace propagation
- Test non-existent and unloaded model trace API handling
- Test custom backend trace activities
- Install OTel Collector and run OTel unit tests
- Test OTel rate=0 workaround and count expiry
- Test BatchSpanProcessor parameter propagation
- Test Python backend trace context isolation
- Run long-running stress tests for both trace modes
Dependencies
../common/util.sh- Common test utilities../common/trace_summary.py- Trace log summarization../clients/simple_http_infer_client- HTTP inference client../clients/simple_grpc_infer_client- gRPC inference clientopentelemetry_unittest.py- OpenTelemetry test suite (19 tests)trace_endpoint_test.py- Trace endpoint test suite (6 tests)trace_stress_grpc_client.py- Stress test gRPC client- OpenTelemetry Collector v0.91.0