Implementation:Triton inference server Server GenQaSequenceModels

Knowledge Sources	Triton Inference Server
Domains	Testing, Model_Generation
Last Updated	2026-02-13 17:00 GMT

Overview

Generates sequence batching test models that accumulate values across inference steps within a sequence.

Description

The `gen_qa_sequence_models.py` script produces models configured with Triton's sequence batcher, which groups inference requests by sequence ID and routes them to the same model instance. The generated models accept a value input along with sequence control signals (start, end, ready, correlation ID) and accumulate the input values across the sequence, enabling tests to verify that Triton correctly maintains per-sequence state. Models are generated for TensorRT, ONNX, TensorFlow, TorchScript, and OpenVINO backends, each implementing the accumulation logic natively.

Usage

Execute this script to create sequence batcher test models before running QA tests that validate sequence batching behavior, including sequence routing, timeout handling, and state management.

Code Reference

Source Location

Repository: Triton Inference Server
File: qa/common/gen_qa_sequence_models.py
Lines: 1-1238

Signature

def create_onnx_modelfile(models_dir, model_version, max_batch, dtype, shape): ...
def create_tf_modelfile(models_dir, model_version, max_batch, dtype, shape): ...
def create_plan_modelfile(models_dir, model_version, max_batch, dtype, shape): ...
def create_openvino_modelfile(models_dir, model_version, max_batch, dtype, shape): ...
def create_libtorch_modelfile(models_dir, model_version, max_batch, dtype, shape): ...
def create_modelconfig(models_dir, model_name, max_batch, dtype, shape): ...
def create_models(models_dir, dtype, shape, no_batch=True): ...

Import

# Typically run as a standalone script
python qa/common/gen_qa_sequence_models.py --models_dir /tmp/models

I/O Contract

Inputs

Name	Type	Required	Description
models_dir	string	Yes	Output directory for generated model repository
dtype	string	No	Data type for model tensors (e.g., int32, fp32)
shape	list[int]	No	Tensor shape for model inputs and outputs
no_batch	bool	No	Whether to also generate no-batch model variants

Outputs

Name	Type	Description
model_repository	directory	Model directories with sequence batcher configurations
config.pbtxt	file	Model configuration with sequence_batching control inputs and states
model files	file	Backend-specific model files implementing value accumulation across sequences

Usage Examples

Generate Sequence Models

python qa/common/gen_qa_sequence_models.py \
    --models_dir /tmp/sequence_models

Use with Triton Server

MODELS_DIR="/opt/triton_qa/sequence_models"
python qa/common/gen_qa_sequence_models.py --models_dir $MODELS_DIR
tritonserver --model-repository=$MODELS_DIR &
python -m pytest test_sequence_batcher.py

Related Pages

Environment:Triton_inference_server_Server_GPU_CUDA_Runtime

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment