Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Kserve Kserve Triton MMS Perf Test

From Leeroopedia
Knowledge Sources
Domains Kubernetes, Model Serving, Performance Testing, Multi-Model Serving
Last Updated 2026-02-13 00:00 GMT

Overview

This file defines a large-scale performance test configuration using the Vegeta load testing tool to benchmark Triton multi-model serving throughput and latency.

Description

The file contains two Kubernetes resources: a Job running the Vegeta load tester and a ConfigMap containing the target URLs and request payloads. The Job runs Vegeta with a 3-minute attack at 50 requests per second against the Triton multi-model serving endpoint, targeting the cifar10 model via the v2 inference protocol. The ConfigMap includes a large JSON payload with CIFAR-10 image data (3x32x32 FP32 pixel values) formatted according to the KFServing v2 inference protocol (with inputs, shape, and datatype fields). The file totals 3,323 lines, with the bulk being the embedded image pixel data.

Usage

Use this configuration to benchmark multi-model serving performance on a KServe cluster with Triton Inference Server. The results provide throughput and latency metrics that are presented in the MMS benchmark documentation. Run it as a Kubernetes Job after deploying a Triton multi-model serving setup.

Code Reference

Source Location

Signature

apiVersion: batch/v1
kind: Job
metadata:
  generateName: torchscript-load-test
spec:
  backoffLimit: 6
  parallelism: 1
  template:
    metadata:
      annotations:
        sidecar.istio.io/inject: "false"
    spec:
      restartPolicy: OnFailure
      containers:
      - args:
        - vegeta -cpus=1 attack -duration=3m -rate=50/1s -targets=/var/vegeta/cfg
          | vegeta report -type=text
        command:
        - sh
        - -c
        image: peterevans/vegeta:latest
        name: vegeta
        volumeMounts:
        - mountPath: /var/vegeta
          name: vegeta-cfg
      volumes:
      - configMap:
          name: torchscript-vegeta-cfg
        name: vegeta-cfg
---
apiVersion: v1
kind: ConfigMap
metadata:
  name: torchscript-vegeta-cfg  # omitted in source; derived from volume ref
data:
  cfg: |
    POST http://triton-mms.default.svc.cluster.local/v2/models/cifar10/infer
    @/var/vegeta/payload
  payload: |
    {
      "inputs": [{
        "name": "INPUT__0",
        "shape": [1, 3, 32, 32],
        "datatype": "FP32",
        "data": [[ ... ]]   # 3x32x32 float pixel values
      }]
    }

Import

kubectl apply -f docs/samples/multimodelserving/triton/perf.yaml

I/O Contract

Job Configuration

Parameter Value Description
Duration 3 minutes Total attack duration
Rate 50 req/s Request rate
CPUs 1 CPU cores for Vegeta
Report Type text Output format for results
Istio Sidecar disabled Avoids proxy interference with load testing

Target Endpoint

Property Value
Method POST
URL http://triton-mms.default.svc.cluster.local/v2/models/cifar10/infer
Protocol KFServing v2 Inference Protocol
Model cifar10

Request Payload

Field Value Description
inputs[0].name INPUT__0 Input tensor name
inputs[0].shape [1, 3, 32, 32] Batch of 1 CIFAR-10 image
inputs[0].datatype FP32 32-bit floating point
inputs[0].data 3x32x32 float array Normalized pixel values

Expected Output

Vegeta produces a text report with:

Metric Description
Requests Total number of requests sent
Rate Actual request rate achieved
Throughput Successful requests per second
Latencies Mean, 50th, 95th, 99th percentile latencies
Success Percentage of successful requests
Status Codes Distribution of HTTP status codes

Usage Examples

# Deploy the Triton MMS setup first, then run the perf test
kubectl apply -f docs/samples/multimodelserving/triton/perf.yaml

# Monitor the job
kubectl get jobs -l job-name=torchscript-load-test

# View the results
kubectl logs job/torchscript-load-test

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment