Implementation:Triton inference server Server L0 Trt Dynamic Shape Test

L0 TRT Dynamic Shape Test

Source File: qa/L0_trt_dynamic_shape/test.sh
Language: Bash (404 lines)
Domains: Testing, TensorRT

Purpose

This QA test shell script validates TensorRT dynamic shape inference in Triton Inference Server. It tests optimization profile selection, enforcement of shape dimension bounds, handling of multiple optimization profiles (both dynamic and static shapes), wrong profile specification, and dynamic batching with profile-per-batch-size configurations.

Signature

#!/bin/bash
# Primary entry point: test.sh [REPO_VERSION]
# Environment variables:
#   NVIDIA_TRITON_SERVER_VERSION - Repository version
#   CUDA_VISIBLE_DEVICES - GPU device selection (set to 0)
#
# External tools:
#   perf_analyzer           - Performance analysis client
#   trt_dynamic_shape_test.py - Python test cases (TrtDynamicShapeTest class)

Key Components

Shape Boundary Enforcement

Tests that TensorRT correctly rejects inference requests with shapes outside the optimization profile bounds. For a model with shape range [4, 32], shapes of 33 (above max) and 3 (below min) are both tested and expected to return specific error messages.

$PERF_CLIENT -v -i grpc -u localhost:8001 -m plan_float32_float32_float32-4-32 \
    --shape INPUT0:33 --shape INPUT1:33 -t 1 -p2000 -b 1
EXPECTED_MESSAGE="model expected the shape of dimension 1 to be between 4 and 32 but received"

Multiple Optimization Profiles

The test model plan_float32_float32_float32 contains 10 optimization profiles (indices 0-9) with varying min/opt/max shape configurations:

# Profile configurations (min, opt, max, index):
# [1, 1], [1, 16], [8, 33], 0
# [1, 1], [2, 16], [7, 32], 1
# [1, 1], [3, 16], [6, 32], 2
# [1, 1], [4, 16], [5, 32], 3
# [5, 1], [6, 16], [8, 32], 4
# [6, 1], [6, 16], [8, 32], 5
# [1, 1], [1, 16], [8, 32], 6
# [1, 33], [1, 33], [1, 33], 7 (static)
# [3, 33], [3, 33], [3, 33], 8 (static)
# [5, 33], [5, 33], [5, 33], 9 (static)

Test Cases

Test Name	Description	Configuration
test_load_specific_optimization_profile	Loads only profile 5 and validates inference	`profile: ["5"]`
test_load_default_optimization_profile	Uses default profile (first available)	Profile field cleared
test_select_optimization_profile (best fit)	Loads profiles 0-3, sends shape [4,16], expects profile 3	`profile: ["0","1","2","3"]`
test_select_optimization_profile (allowed)	Loads profiles 0,5, sends shape [4,16], expects profile 0 (profile 5 requires min batch 6)	`profile: ["0","5"]`
test_load_wrong_optimization_profile	Attempts to load non-existent profile 100	`profile: ["100"]`

Static Shape Profiles

Tests that static shape profiles (7, 8, 9) work correctly with the autocomplete feature (--strict-model-config=false). Validates that batch size 5 succeeds (max across profiles), batch size 6 fails, and batch size 2 with shape 33 fails because no profile supports batch dimension 2 with that shape.

(cd ${DATADIR}/plan_float32_float32_float32/ && \
    echo "instance_group { profile : [\"7\", \"8\", \"9\" ] }" >> config.pbtxt)
SERVER_ARGS="--model-repository=$DATADIR --strict-model-config=false"

Dynamic Batching with Profiles

Tests profiles 10-17, each supporting a different fixed batch size (1-8) with dynamic shapes. Validates that dynamic_batching {} works correctly when combined with per-batch-size optimization profiles using 16 concurrent threads.

Test Flow

Load single-profile model and test shape boundary enforcement
Set up multi-profile model with dynamic shapes
Test specific profile loading and validation
Test default profile selection
Test best-fit profile selection with verbose server logging
Test profile selection respecting min dimension constraints
Test error handling for invalid profile indices
Test static shape profiles with autocomplete
Test dynamic batching with per-batch-size profiles

Dependencies

perf_analyzer - NVIDIA performance analysis tool
trt_dynamic_shape_test.py - Python unittest-based test cases
../common/util.sh - Common test utility functions
TensorRT plan models from qa_variable_model_repository

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment