Implementation:Triton inference server Server GenQaRaggedModels
| Knowledge Sources | |
|---|---|
| Domains | Testing, Model_Generation |
| Last Updated | 2026-02-13 17:00 GMT |
Overview
Generates test models for ragged (variable-length) batch input handling in Triton Inference Server.
Description
The `gen_qa_ragged_models.py` script creates models that accept ragged batches, where individual requests within a batch can have different input tensor sizes. It generates models configured with the `allow_ragged_batch` flag and produces backends that handle batch input tensors alongside batch element size information. The script supports TensorRT and ONNX backends, creating models that aggregate variable-length inputs so that QA tests can validate Triton's ragged batching feature correctly pads, concatenates, and processes unevenly-sized batch elements.
Usage
Run this script when preparing to test Triton's ragged batching feature. It is invoked before QA tests that send requests with variable-length inputs to verify correct handling of non-uniform batch elements.
Code Reference
Source Location
- Repository: Triton Inference Server
- File: qa/common/gen_qa_ragged_models.py
- Lines: 1-683
Signature
def create_onnx_modelfile(models_dir, model_version, max_batch, dtype, shape): ...
def create_plan_modelfile(models_dir, model_version, max_batch, dtype, shape): ...
def create_modelconfig(models_dir, model_name, max_batch, dtype, shape): ...
def create_models(models_dir, dtype, shape): ...
Import
# Typically run as a standalone script
python qa/common/gen_qa_ragged_models.py --models_dir /tmp/models
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| models_dir | string | Yes | Output directory for generated model repository |
| dtype | string | No | Data type for model tensors (e.g., TYPE_FP32) |
| shape | list[int] | No | Base tensor shape (actual shapes vary per batch element) |
Outputs
| Name | Type | Description |
|---|---|---|
| model_repository | directory | Model directories with ragged-batch-enabled configurations |
| config.pbtxt | file | Model configuration with allow_ragged_batch enabled on inputs |
| model files | file | Backend-specific model files that handle variable-length inputs |
Usage Examples
Generate Ragged Batch Models
python qa/common/gen_qa_ragged_models.py \
--models_dir /tmp/ragged_models
CI Test Setup
MODELS_DIR="${DATADIR}/qa_ragged_model_repository"
python qa/common/gen_qa_ragged_models.py --models_dir $MODELS_DIR
SERVER_ARGS="--model-repository=$MODELS_DIR"
run_server