Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Triton inference server Server GenQaSequenceModels

From Leeroopedia
Revision as of 13:58, 16 February 2026 by Admin (talk | contribs) (Auto-imported from implementations/Triton_inference_server_Server_GenQaSequenceModels.md)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Knowledge Sources
Domains Testing, Model_Generation
Last Updated 2026-02-13 17:00 GMT

Overview

Generates sequence batching test models that accumulate values across inference steps within a sequence.

Description

The `gen_qa_sequence_models.py` script produces models configured with Triton's sequence batcher, which groups inference requests by sequence ID and routes them to the same model instance. The generated models accept a value input along with sequence control signals (start, end, ready, correlation ID) and accumulate the input values across the sequence, enabling tests to verify that Triton correctly maintains per-sequence state. Models are generated for TensorRT, ONNX, TensorFlow, TorchScript, and OpenVINO backends, each implementing the accumulation logic natively.

Usage

Execute this script to create sequence batcher test models before running QA tests that validate sequence batching behavior, including sequence routing, timeout handling, and state management.

Code Reference

Source Location

Signature

def create_onnx_modelfile(models_dir, model_version, max_batch, dtype, shape): ...
def create_tf_modelfile(models_dir, model_version, max_batch, dtype, shape): ...
def create_plan_modelfile(models_dir, model_version, max_batch, dtype, shape): ...
def create_openvino_modelfile(models_dir, model_version, max_batch, dtype, shape): ...
def create_libtorch_modelfile(models_dir, model_version, max_batch, dtype, shape): ...
def create_modelconfig(models_dir, model_name, max_batch, dtype, shape): ...
def create_models(models_dir, dtype, shape, no_batch=True): ...

Import

# Typically run as a standalone script
python qa/common/gen_qa_sequence_models.py --models_dir /tmp/models

I/O Contract

Inputs

Name Type Required Description
models_dir string Yes Output directory for generated model repository
dtype string No Data type for model tensors (e.g., int32, fp32)
shape list[int] No Tensor shape for model inputs and outputs
no_batch bool No Whether to also generate no-batch model variants

Outputs

Name Type Description
model_repository directory Model directories with sequence batcher configurations
config.pbtxt file Model configuration with sequence_batching control inputs and states
model files file Backend-specific model files implementing value accumulation across sequences

Usage Examples

Generate Sequence Models

python qa/common/gen_qa_sequence_models.py \
    --models_dir /tmp/sequence_models

Use with Triton Server

MODELS_DIR="/opt/triton_qa/sequence_models"
python qa/common/gen_qa_sequence_models.py --models_dir $MODELS_DIR
tritonserver --model-repository=$MODELS_DIR &
python -m pytest test_sequence_batcher.py

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment