Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Triton inference server Server L0 Trace Test

From Leeroopedia


L0 Trace Test

Source File: qa/L0_trace/test.sh
Language: Bash (1349 lines)
Domains: Testing, Tracing

Purpose

This comprehensive QA test shell script validates Triton Inference Server's tracing functionality across both file-based (Triton native) and OpenTelemetry tracing modes. It exercises the trace settings API for global and per-model configuration, trace rate limiting, trace count management, log frequency rotation, custom backend tracing, BLS/ensemble trace propagation, BatchSpanProcessor parameter tuning, Python backend trace context exposure, and long-running stress tests for trace stability.

Signature

#!/bin/bash
# Primary entry point: test.sh [REPO_VERSION]
# Environment variables:
#   NVIDIA_TRITON_SERVER_VERSION - Repository version
#   CUDA_VISIBLE_DEVICES - GPU device selection (set to 0)
#
# Key functions:
#   assert_curl_success()             - Assert HTTP 200 response
#   assert_curl_failure()             - Assert non-200 response
#   get_global_trace_setting()        - GET /v2/trace/setting
#   get_trace_setting(model)          - GET /v2/models/{model}/trace/setting
#   update_global_trace_setting(json) - POST /v2/trace/setting
#   update_trace_setting(model, json) - POST /v2/models/{model}/trace/setting
#   send_inference_requests(log, n)   - Send n HTTP+gRPC inference request pairs
#   check_pbe_trace_context(model, b) - Check Python backend trace context
#   run_stress_client(client, log)    - Run trace stress client for 120 seconds

Key Components

Model Setup

The script prepares a diverse set of models including simple, global_simple, ensemble_add_sub_int32_int32_int32, bls_simple, repeat_int32, custom_identity_int32, identity_fp32, dynamic_batch, input_all_required, and trace_context. These cover basic inference, ensemble pipelines, BLS chains, custom backends, and decoupled models.

Trace OFF to TIMESTAMPS

Tests dynamic trace activation by starting the server with tracing disabled (level=OFF), then enabling it via the trace API to TIMESTAMPS level. Validates that only requests after API activation are traced.

SERVER_ARGS="--trace-config triton,file=trace_off_to_min.log --trace-config level=OFF --trace-config rate=1 ..."
run_server
update_global_trace_setting '{"trace_level":["TIMESTAMPS"]}'
send_inference_requests "client_min.log" 10
# Expect 20 traced requests (10 HTTP + 10 gRPC)

Per-Model Trace Settings

Demonstrates per-model trace configuration through the API. Tests that model-specific settings override global settings, log frequency rotation creates indexed trace files, and clearing model settings reverts to global defaults.

update_trace_setting "simple" '{"log_frequency":"2"}'
update_trace_setting "simple" '{"trace_level":["OFF"]}'
update_trace_setting "simple" '{"trace_level":null}'  # Clear to revert to global

Trace Count Management

Verifies the trace_count mechanism that limits the total number of traced requests. After the count is exhausted, the setting transitions to 0 and generates an indexed log file. Also validates out-of-range error handling.

BLS and Ensemble Tracing

Tests trace propagation through BLS (Business Logic Scripting) and ensemble model chains. Validates that parent-child trace relationships are correctly maintained with proper parent_id fields across the execution pipeline.

Custom Backend Tracing

Validates that custom backend trace activities (CUSTOM_SINGLE_ACTIVITY, CUSTOM_ACTIVITY_START, CUSTOM_ACTIVITY_END) are properly recorded when the identity backend has enable_custom_tracing enabled.

Non-Existent Model Handling

Tests that the trace API returns errors for non-existent models and unloaded models, and correctly handles model load/unload/reload cycles.

OpenTelemetry Integration

Installs the OpenTelemetry Collector (v0.91.0) and runs the opentelemetry_unittest.py test suite with the server configured in mode=opentelemetry. Tests include resource attribute propagation, SageMaker endpoint compatibility, and Python backend trace context.

SERVER_ARGS="... --trace-config=mode=opentelemetry \
    --trace-config=opentelemetry,resource=test.key=test.value \
    --trace-config=opentelemetry,resource=service.name=test_triton \
    --trace-config=opentelemetry,url=localhost:$OTLP_PORT/v1/traces ..."

OTel Workaround with Rate 0

Tests that with rate=0, no traces are collected for normal requests, but requests carrying an OTel context header (traceparent) are still traced.

OTel Count Expiry

Validates that after trace count expires, only requests with explicit OTel context headers continue to be traced, while regular requests are not.

BatchSpanProcessor Configuration

Tests three BSP parameters:

  • bsp_max_queue_size=1 - Verifies spans are dropped with a warning
  • bsp_schedule_delay=0 - Verifies multiple batches are exported
  • bsp_max_export_batch_size=1 - Verifies exactly 6 scopeSpans entries

Python Backend Trace Context

Tests that the Python backend receives None trace context in Triton trace mode and when tracing is OFF, confirming proper context isolation.

Long-Running Stress Tests

Runs 120-second stress tests for both Triton and OpenTelemetry trace modes to verify server stability under continuous traced inference load.

Test Flow

  1. Set up model repository with diverse model types
  2. Test dynamic trace activation (OFF -> TIMESTAMPS)
  3. Test per-model trace settings and overrides
  4. Test trace count limits and file rotation
  5. Test BLS/ensemble trace propagation
  6. Test non-existent and unloaded model trace API handling
  7. Test custom backend trace activities
  8. Install OTel Collector and run OTel unit tests
  9. Test OTel rate=0 workaround and count expiry
  10. Test BatchSpanProcessor parameter propagation
  11. Test Python backend trace context isolation
  12. Run long-running stress tests for both trace modes

Dependencies

  • ../common/util.sh - Common test utilities
  • ../common/trace_summary.py - Trace log summarization
  • ../clients/simple_http_infer_client - HTTP inference client
  • ../clients/simple_grpc_infer_client - gRPC inference client
  • opentelemetry_unittest.py - OpenTelemetry test suite (19 tests)
  • trace_endpoint_test.py - Trace endpoint test suite (6 tests)
  • trace_stress_grpc_client.py - Stress test gRPC client
  • OpenTelemetry Collector v0.91.0

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment