Implementation:Triton inference server Server L0 Trt Dynamic Shape Test
L0 TRT Dynamic Shape Test
Source File: qa/L0_trt_dynamic_shape/test.sh
Language: Bash (404 lines)
Domains: Testing, TensorRT
Purpose
This QA test shell script validates TensorRT dynamic shape inference in Triton Inference Server. It tests optimization profile selection, enforcement of shape dimension bounds, handling of multiple optimization profiles (both dynamic and static shapes), wrong profile specification, and dynamic batching with profile-per-batch-size configurations.
Signature
#!/bin/bash
# Primary entry point: test.sh [REPO_VERSION]
# Environment variables:
# NVIDIA_TRITON_SERVER_VERSION - Repository version
# CUDA_VISIBLE_DEVICES - GPU device selection (set to 0)
#
# External tools:
# perf_analyzer - Performance analysis client
# trt_dynamic_shape_test.py - Python test cases (TrtDynamicShapeTest class)
Key Components
Shape Boundary Enforcement
Tests that TensorRT correctly rejects inference requests with shapes outside the optimization profile bounds. For a model with shape range [4, 32], shapes of 33 (above max) and 3 (below min) are both tested and expected to return specific error messages.
$PERF_CLIENT -v -i grpc -u localhost:8001 -m plan_float32_float32_float32-4-32 \
--shape INPUT0:33 --shape INPUT1:33 -t 1 -p2000 -b 1
EXPECTED_MESSAGE="model expected the shape of dimension 1 to be between 4 and 32 but received"
Multiple Optimization Profiles
The test model plan_float32_float32_float32 contains 10 optimization profiles (indices 0-9) with varying min/opt/max shape configurations:
# Profile configurations (min, opt, max, index):
# [1, 1], [1, 16], [8, 33], 0
# [1, 1], [2, 16], [7, 32], 1
# [1, 1], [3, 16], [6, 32], 2
# [1, 1], [4, 16], [5, 32], 3
# [5, 1], [6, 16], [8, 32], 4
# [6, 1], [6, 16], [8, 32], 5
# [1, 1], [1, 16], [8, 32], 6
# [1, 33], [1, 33], [1, 33], 7 (static)
# [3, 33], [3, 33], [3, 33], 8 (static)
# [5, 33], [5, 33], [5, 33], 9 (static)
Test Cases
| Test Name | Description | Configuration |
|---|---|---|
| test_load_specific_optimization_profile | Loads only profile 5 and validates inference | profile: ["5"]
|
| test_load_default_optimization_profile | Uses default profile (first available) | Profile field cleared |
| test_select_optimization_profile (best fit) | Loads profiles 0-3, sends shape [4,16], expects profile 3 | profile: ["0","1","2","3"]
|
| test_select_optimization_profile (allowed) | Loads profiles 0,5, sends shape [4,16], expects profile 0 (profile 5 requires min batch 6) | profile: ["0","5"]
|
| test_load_wrong_optimization_profile | Attempts to load non-existent profile 100 | profile: ["100"]
|
Static Shape Profiles
Tests that static shape profiles (7, 8, 9) work correctly with the autocomplete feature (--strict-model-config=false). Validates that batch size 5 succeeds (max across profiles), batch size 6 fails, and batch size 2 with shape 33 fails because no profile supports batch dimension 2 with that shape.
(cd ${DATADIR}/plan_float32_float32_float32/ && \
echo "instance_group { profile : [\"7\", \"8\", \"9\" ] }" >> config.pbtxt)
SERVER_ARGS="--model-repository=$DATADIR --strict-model-config=false"
Dynamic Batching with Profiles
Tests profiles 10-17, each supporting a different fixed batch size (1-8) with dynamic shapes. Validates that dynamic_batching {} works correctly when combined with per-batch-size optimization profiles using 16 concurrent threads.
Test Flow
- Load single-profile model and test shape boundary enforcement
- Set up multi-profile model with dynamic shapes
- Test specific profile loading and validation
- Test default profile selection
- Test best-fit profile selection with verbose server logging
- Test profile selection respecting min dimension constraints
- Test error handling for invalid profile indices
- Test static shape profiles with autocomplete
- Test dynamic batching with per-batch-size profiles
Dependencies
perf_analyzer- NVIDIA performance analysis tooltrt_dynamic_shape_test.py- Python unittest-based test cases../common/util.sh- Common test utility functions- TensorRT plan models from
qa_variable_model_repository