Overview
This file defines a large-scale performance test configuration using the Vegeta load testing tool to benchmark Triton multi-model serving throughput and latency.
Description
The file contains two Kubernetes resources: a Job running the Vegeta load tester and a ConfigMap containing the target URLs and request payloads. The Job runs Vegeta with a 3-minute attack at 50 requests per second against the Triton multi-model serving endpoint, targeting the cifar10 model via the v2 inference protocol. The ConfigMap includes a large JSON payload with CIFAR-10 image data (3x32x32 FP32 pixel values) formatted according to the KFServing v2 inference protocol (with inputs, shape, and datatype fields). The file totals 3,323 lines, with the bulk being the embedded image pixel data.
Usage
Use this configuration to benchmark multi-model serving performance on a KServe cluster with Triton Inference Server. The results provide throughput and latency metrics that are presented in the MMS benchmark documentation. Run it as a Kubernetes Job after deploying a Triton multi-model serving setup.
Code Reference
Source Location
Signature
apiVersion: batch/v1
kind: Job
metadata:
generateName: torchscript-load-test
spec:
backoffLimit: 6
parallelism: 1
template:
metadata:
annotations:
sidecar.istio.io/inject: "false"
spec:
restartPolicy: OnFailure
containers:
- args:
- vegeta -cpus=1 attack -duration=3m -rate=50/1s -targets=/var/vegeta/cfg
| vegeta report -type=text
command:
- sh
- -c
image: peterevans/vegeta:latest
name: vegeta
volumeMounts:
- mountPath: /var/vegeta
name: vegeta-cfg
volumes:
- configMap:
name: torchscript-vegeta-cfg
name: vegeta-cfg
---
apiVersion: v1
kind: ConfigMap
metadata:
name: torchscript-vegeta-cfg # omitted in source; derived from volume ref
data:
cfg: |
POST http://triton-mms.default.svc.cluster.local/v2/models/cifar10/infer
@/var/vegeta/payload
payload: |
{
"inputs": [{
"name": "INPUT__0",
"shape": [1, 3, 32, 32],
"datatype": "FP32",
"data": [[ ... ]] # 3x32x32 float pixel values
}]
}
Import
kubectl apply -f docs/samples/multimodelserving/triton/perf.yaml
I/O Contract
Job Configuration
| Parameter |
Value |
Description
|
| Duration |
3 minutes |
Total attack duration
|
| Rate |
50 req/s |
Request rate
|
| CPUs |
1 |
CPU cores for Vegeta
|
| Report Type |
text |
Output format for results
|
| Istio Sidecar |
disabled |
Avoids proxy interference with load testing
|
Target Endpoint
Request Payload
| Field |
Value |
Description
|
inputs[0].name |
INPUT__0 |
Input tensor name
|
inputs[0].shape |
[1, 3, 32, 32] |
Batch of 1 CIFAR-10 image
|
inputs[0].datatype |
FP32 |
32-bit floating point
|
inputs[0].data |
3x32x32 float array |
Normalized pixel values
|
Expected Output
Vegeta produces a text report with:
| Metric |
Description
|
| Requests |
Total number of requests sent
|
| Rate |
Actual request rate achieved
|
| Throughput |
Successful requests per second
|
| Latencies |
Mean, 50th, 95th, 99th percentile latencies
|
| Success |
Percentage of successful requests
|
| Status Codes |
Distribution of HTTP status codes
|
Usage Examples
# Deploy the Triton MMS setup first, then run the perf test
kubectl apply -f docs/samples/multimodelserving/triton/perf.yaml
# Monitor the job
kubectl get jobs -l job-name=torchscript-load-test
# View the results
kubectl logs job/torchscript-load-test
Related Pages
Page Connections
Double-click a node to navigate. Hold to expand connections.