Implementation:Pytorch Serve Benchmark
| Knowledge Sources | |
|---|---|
| Domains | Benchmarking, Performance_Testing |
| Last Updated | 2026-02-13 18:52 GMT |
Overview
Benchmark is a JMeter-based benchmarking framework for TorchServe. It provides a comprehensive suite of static methods for measuring throughput, latency, and ping performance of TorchServe endpoints. The module supports both single-model and multi-model benchmark scenarios, uses Docker for consistent environment isolation, and generates CSV-formatted reports.
Description
The benchmark.py module implements the Benchmarks class with static methods for different benchmark profiles. It orchestrates JMeter test plans against running TorchServe instances, parses model configurations, and generates structured performance reports.
Key Responsibilities
- Benchmark Profiles: Provides static methods for throughput, latency, and ping benchmarks via the
Benchmarksclass - Model Parsing: Parses model archive files and configurations through
parseModel()to extract handler, model URL, and runtime parameters - Single-Model Benchmarking: Executes benchmarks against a single registered model via
run_single_benchmark() - Multi-Model Benchmarking: Runs benchmarks across multiple models simultaneously via
run_multi_benchmark() - Docker Integration: Uses Docker containers to ensure reproducible benchmark environments
- CSV Report Generation: Produces CSV reports with throughput, latency percentiles, and error metrics
Key Class: Benchmarks
The Benchmarks class (lines 93-171) contains static methods that define different benchmark profiles. Each method configures JMeter parameters for a specific type of performance test.
Usage
from benchmarks.benchmark import Benchmarks, run_single_benchmark, run_multi_benchmark, parseModel
Run from the command line:
python benchmarks/benchmark.py --model resnet-18 --url https://torchserve.pytorch.org/mar_files/resnet-18.mar
Code Reference
Source Location
| File | Lines | Repository |
|---|---|---|
benchmarks/benchmark.py |
L1-510 | pytorch/serve |
benchmarks/benchmark.py |
L93-171 | Benchmarks class with static benchmark methods
|
benchmarks/benchmark.py |
L173-320 | run_single_benchmark() function
|
benchmarks/benchmark.py |
L322-430 | run_multi_benchmark() function
|
benchmarks/benchmark.py |
L432-510 | parseModel() and main()
|
Signature
class Benchmarks:
"""
Collection of static benchmark methods for TorchServe performance testing.
Each method configures JMeter parameters for a specific benchmark profile
and returns the benchmark results as a dictionary.
"""
@staticmethod
def throughput(model_name, model_url, config):
"""
Run a throughput-focused benchmark.
Maximizes request rate to measure peak throughput (requests/second).
Args:
model_name (str): Name of the model to benchmark.
model_url (str): URL or path to the model archive.
config (dict): Benchmark configuration parameters.
Returns:
dict: Throughput metrics including req/s and error rate.
"""
...
@staticmethod
def latency(model_name, model_url, config):
"""
Run a latency-focused benchmark.
Measures response time distribution at controlled request rate.
Args:
model_name (str): Name of the model to benchmark.
model_url (str): URL or path to the model archive.
config (dict): Benchmark configuration parameters.
Returns:
dict: Latency metrics (p50, p90, p99, mean, max).
"""
...
@staticmethod
def ping(config):
"""
Run a ping benchmark to measure health endpoint responsiveness.
Args:
config (dict): Benchmark configuration parameters.
Returns:
dict: Ping response time metrics.
"""
...
def run_single_benchmark(model_name, model_url, config):
"""
Execute a complete benchmark for a single model.
Starts TorchServe, registers the model, runs JMeter test plans,
and collects metrics.
Args:
model_name (str): Name of the model.
model_url (str): URL or path to the model archive.
config (dict): Benchmark configuration.
Returns:
dict: Aggregated benchmark results.
"""
...
def run_multi_benchmark(models, config):
"""
Execute benchmarks across multiple models simultaneously.
Args:
models (list): List of (model_name, model_url) tuples.
config (dict): Shared benchmark configuration.
Returns:
list[dict]: List of benchmark results for each model.
"""
...
def parseModel(model_path):
"""
Parse a model archive to extract handler and configuration details.
Args:
model_path (str): Path to the .mar file or model directory.
Returns:
tuple: (model_name, handler, model_url, extra_files).
"""
...
Import
from benchmarks.benchmark import Benchmarks
from benchmarks.benchmark import run_single_benchmark
from benchmarks.benchmark import run_multi_benchmark
from benchmarks.benchmark import parseModel
I/O Contract
| Function / Class | Input | Output | Notes |
|---|---|---|---|
Benchmarks.throughput(model_name, model_url, config) |
Model name (str), model URL (str), config (dict) |
dict with throughput metrics (req/s, error rate) |
Static method; maximizes request rate |
Benchmarks.latency(model_name, model_url, config) |
Model name (str), model URL (str), config (dict) |
dict with latency percentiles (p50, p90, p99, mean, max) |
Static method; controlled request rate |
Benchmarks.ping(config) |
config (dict) |
dict with ping response time metrics |
Tests health endpoint responsiveness |
run_single_benchmark(model_name, model_url, config) |
Model name, URL, config | dict with aggregated benchmark results |
Starts/stops TorchServe, registers model |
run_multi_benchmark(models, config) |
List of (model_name, model_url) tuples, config | list[dict] of results per model |
Runs benchmarks across multiple models |
parseModel(model_path) |
Path to .mar or model directory |
tuple: (model_name, handler, model_url, extra_files) |
Extracts model metadata from archive |
CSV Report Format
Model,Concurrency,Requests,Throughput(req/s),Latency_p50(ms),Latency_p90(ms),Latency_p99(ms),Latency_mean(ms),Error_rate(%)
resnet-18,10,1000,245.3,38.2,52.1,78.4,41.5,0.0
vgg16,20,5000,112.7,165.3,210.8,298.6,178.2,0.1
Usage Examples
Example 1: Running a single-model throughput benchmark
# Run a throughput benchmark for resnet-18
python benchmarks/benchmark.py \
--model resnet-18 \
--url https://torchserve.pytorch.org/mar_files/resnet-18.mar \
--benchmark throughput \
--concurrency 10 \
--requests 1000
Example 2: Using the Benchmarks class programmatically
from benchmarks.benchmark import Benchmarks, run_single_benchmark
config = {
"concurrency": 10,
"requests": 1000,
"batch_size": 1,
"workers": 4,
"docker": True,
}
# Run throughput benchmark
throughput_results = Benchmarks.throughput(
model_name="resnet-18",
model_url="https://torchserve.pytorch.org/mar_files/resnet-18.mar",
config=config,
)
print(f"Throughput: {throughput_results['throughput']} req/s")
# Run latency benchmark
latency_results = Benchmarks.latency(
model_name="resnet-18",
model_url="https://torchserve.pytorch.org/mar_files/resnet-18.mar",
config=config,
)
print(f"Latency p99: {latency_results['latency_p99']} ms")
Example 3: Multi-model benchmarking
from benchmarks.benchmark import run_multi_benchmark
models = [
("resnet-18", "https://torchserve.pytorch.org/mar_files/resnet-18.mar"),
("vgg16", "https://torchserve.pytorch.org/mar_files/vgg16.mar"),
("densenet161", "https://torchserve.pytorch.org/mar_files/densenet161.mar"),
]
config = {
"concurrency": 20,
"requests": 5000,
"batch_size": 4,
"workers": 2,
}
results = run_multi_benchmark(models, config)
for result in results:
print(f"{result['model']}: {result['throughput']} req/s, p99={result['latency_p99']} ms")
Related Pages
- Principle:Pytorch_Serve_Automated_Benchmarking -- The principle of automated performance benchmarking for TorchServe models
- Implementation:Pytorch_Serve_Auto_Benchmark -- Automated benchmark orchestration that invokes this module