Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Triton inference server Server GenQaRaggedModels

From Leeroopedia
Revision as of 13:58, 16 February 2026 by Admin (talk | contribs) (Auto-imported from implementations/Triton_inference_server_Server_GenQaRaggedModels.md)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Knowledge Sources
Domains Testing, Model_Generation
Last Updated 2026-02-13 17:00 GMT

Overview

Generates test models for ragged (variable-length) batch input handling in Triton Inference Server.

Description

The `gen_qa_ragged_models.py` script creates models that accept ragged batches, where individual requests within a batch can have different input tensor sizes. It generates models configured with the `allow_ragged_batch` flag and produces backends that handle batch input tensors alongside batch element size information. The script supports TensorRT and ONNX backends, creating models that aggregate variable-length inputs so that QA tests can validate Triton's ragged batching feature correctly pads, concatenates, and processes unevenly-sized batch elements.

Usage

Run this script when preparing to test Triton's ragged batching feature. It is invoked before QA tests that send requests with variable-length inputs to verify correct handling of non-uniform batch elements.

Code Reference

Source Location

Signature

def create_onnx_modelfile(models_dir, model_version, max_batch, dtype, shape): ...
def create_plan_modelfile(models_dir, model_version, max_batch, dtype, shape): ...
def create_modelconfig(models_dir, model_name, max_batch, dtype, shape): ...
def create_models(models_dir, dtype, shape): ...

Import

# Typically run as a standalone script
python qa/common/gen_qa_ragged_models.py --models_dir /tmp/models

I/O Contract

Inputs

Name Type Required Description
models_dir string Yes Output directory for generated model repository
dtype string No Data type for model tensors (e.g., TYPE_FP32)
shape list[int] No Base tensor shape (actual shapes vary per batch element)

Outputs

Name Type Description
model_repository directory Model directories with ragged-batch-enabled configurations
config.pbtxt file Model configuration with allow_ragged_batch enabled on inputs
model files file Backend-specific model files that handle variable-length inputs

Usage Examples

Generate Ragged Batch Models

python qa/common/gen_qa_ragged_models.py \
    --models_dir /tmp/ragged_models

CI Test Setup

MODELS_DIR="${DATADIR}/qa_ragged_model_repository"
python qa/common/gen_qa_ragged_models.py --models_dir $MODELS_DIR
SERVER_ARGS="--model-repository=$MODELS_DIR"
run_server

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment