Implementation:Triton inference server Server GenQaRaggedModels

Knowledge Sources	Triton Inference Server
Domains	Testing, Model_Generation
Last Updated	2026-02-13 17:00 GMT

Overview

Generates test models for ragged (variable-length) batch input handling in Triton Inference Server.

Description

The `gen_qa_ragged_models.py` script creates models that accept ragged batches, where individual requests within a batch can have different input tensor sizes. It generates models configured with the `allow_ragged_batch` flag and produces backends that handle batch input tensors alongside batch element size information. The script supports TensorRT and ONNX backends, creating models that aggregate variable-length inputs so that QA tests can validate Triton's ragged batching feature correctly pads, concatenates, and processes unevenly-sized batch elements.

Usage

Run this script when preparing to test Triton's ragged batching feature. It is invoked before QA tests that send requests with variable-length inputs to verify correct handling of non-uniform batch elements.

Code Reference

Source Location

Repository: Triton Inference Server
File: qa/common/gen_qa_ragged_models.py
Lines: 1-683

Signature

def create_onnx_modelfile(models_dir, model_version, max_batch, dtype, shape): ...
def create_plan_modelfile(models_dir, model_version, max_batch, dtype, shape): ...
def create_modelconfig(models_dir, model_name, max_batch, dtype, shape): ...
def create_models(models_dir, dtype, shape): ...

Import

# Typically run as a standalone script
python qa/common/gen_qa_ragged_models.py --models_dir /tmp/models

I/O Contract

Inputs

Name	Type	Required	Description
models_dir	string	Yes	Output directory for generated model repository
dtype	string	No	Data type for model tensors (e.g., TYPE_FP32)
shape	list[int]	No	Base tensor shape (actual shapes vary per batch element)

Outputs

Name	Type	Description
model_repository	directory	Model directories with ragged-batch-enabled configurations
config.pbtxt	file	Model configuration with allow_ragged_batch enabled on inputs
model files	file	Backend-specific model files that handle variable-length inputs

Usage Examples

Generate Ragged Batch Models

python qa/common/gen_qa_ragged_models.py \
    --models_dir /tmp/ragged_models

CI Test Setup

MODELS_DIR="${DATADIR}/qa_ragged_model_repository"
python qa/common/gen_qa_ragged_models.py --models_dir $MODELS_DIR
SERVER_ARGS="--model-repository=$MODELS_DIR"
run_server

Related Pages

Environment:Triton_inference_server_Server_GPU_CUDA_Runtime

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment