Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Triton inference server Server L0 Warmup Test

From Leeroopedia


L0 Warmup Test

Source File: qa/L0_warmup/test.sh
Language: Bash (492 lines)
Domains: Testing, Model_Lifecycle

Purpose

This QA test shell script validates Triton Inference Server's model warmup functionality. Model warmup runs sample inference requests during model loading to pre-initialize compute resources (e.g., GPU memory allocation, kernel compilation) before the model begins serving real traffic. The script tests warmup with fixed-size and variable-size data types, stateless and stateful (sequence) models, multiple warmup iterations, warmup failure handling, decoupled models, and output memory type preservation after warmup.

Signature

#!/bin/bash
# Primary entry point: test.sh [REPO_VERSION]
# Environment variables:
#   NVIDIA_TRITON_SERVER_VERSION - Repository version
#   CUDA_VISIBLE_DEVICES - GPU device selection (set to 0)
#   BACKENDS - Space-separated list of backends (default: "onnx libtorch plan")

Key Components

Fixed-Size Data Type Warmup

For each backend, the script configures warmup for two model types:

Stateless model ({backend}_float32_float32_float32): Configured with a single warmup sample using zero data for INPUT0 and random data for INPUT1, batch size 1.

(cd models/${BACKEND}_float32_float32_float32 && \
    echo "model_warmup [{" >> config.pbtxt && \
    echo "    name : \"regular sample\"" >> config.pbtxt && \
    echo "    batch_size: 1" >> config.pbtxt && \
    echo "    inputs {" >> config.pbtxt && \
    echo "        key: \"${INPUT_PREFIX}0\"" >> config.pbtxt && \
    echo "        value: { data_type: TYPE_FP32  dims: 16  zero_data: true }" >> config.pbtxt && \
    echo "    }" >> config.pbtxt && \
    # ... INPUT1 with random_data: true
    echo "}]" >> config.pbtxt )

Sequence model ({backend}_sequence_int32): Configured with a warmup sample at batch size 8 with count 2 (two iterations), including control tensors (START, READY) alongside the sequence input.

The script verifies warmup execution by checking server logs for:

  • "is running warmup sample 'regular sample'"
  • "is running warmup sample 'sequence sample' for iteration 1"
  • "is running warmup sample 'sequence sample' for iteration 2"
  • Absence of "failed to run warmup"

Variable-Size Data Type Warmup (String)

For ONNX backends that support string data types, tests warmup with:

  • Zero string stateless: Zero-initialized string data
  • Random string stateless: Randomly generated string data
  • String stateful: User-provided binary string data from a warmup file (raw_string_data)
# Prepare binary string data (one element: "233")
mkdir -p models/${BACKEND}_sequence_object/warmup && \
    (cd models/${BACKEND}_sequence_object/warmup && \
        echo -n -e '\x03\x00\x00\x00\x32\x33\x33' > raw_string_data)

Warmup Failure Handling

Tests that a model (failing_infer) designed to produce inference errors during warmup causes the server to fail to start, with the specific error message: "failed to run warmup sample 'zero sample': An Error Occurred;"

Decoupled Model Warmup

Validates that warmup works correctly with decoupled models, where the response lifecycle differs from standard models. Checks for successful warmup execution without errors.

Output Memory Type Preservation

Installs PyTorch and tests that warmup does not alter the memory type of output tensors. Uses a BLS model (bls_onnx_warmup) that wraps an ONNX model configured with KIND_GPU instance group and warmup enabled, then runs test_infer_shm_leak.py via pytest to verify memory behavior.

(cd models/onnx_nobatch_float32_float32_float32 && \
    echo 'instance_group [{ kind : KIND_GPU }]' >> config.pbtxt && \
    echo 'model_warmup [{ name : "sample" batch_size: 1 ...' >> config.pbtxt)
export MODEL_NAME='bls_onnx_warmup'
python3 -m pytest --junitxml=warmup.report.xml $CLIENT_PY

Test Flow

  1. Iterate over each backend (onnx, libtorch, plan)
    1. Configure warmup for stateless and sequence models with fixed-size types
    2. Start server and verify warmup log messages
    3. For string-supporting backends, test variable-size type warmup
  2. Test warmup failure handling with a failing model
  3. Test decoupled model warmup
  4. Test output memory type preservation with BLS + ONNX + GPU warmup

Dependencies

  • ../common/util.sh - Common test utility functions
  • ../common/infer_test.py - Inference validation test
  • test_infer_shm_leak.py - Shared memory leak test
  • ../python_models/bls_onnx_warmup/ - BLS warmup model
  • failing_infer - Model designed to fail during inference
  • decoupled - Decoupled model for warmup testing
  • PyTorch (installed during test for memory type verification)

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment