Implementation:Pytorch Serve Benchmark

Knowledge Sources	Pytorch_Serve
Domains	Benchmarking, Performance_Testing
Last Updated	2026-02-13 18:52 GMT

Overview

Benchmark is a JMeter-based benchmarking framework for TorchServe. It provides a comprehensive suite of static methods for measuring throughput, latency, and ping performance of TorchServe endpoints. The module supports both single-model and multi-model benchmark scenarios, uses Docker for consistent environment isolation, and generates CSV-formatted reports.

Description

The benchmark.py module implements the Benchmarks class with static methods for different benchmark profiles. It orchestrates JMeter test plans against running TorchServe instances, parses model configurations, and generates structured performance reports.

Key Responsibilities

Benchmark Profiles: Provides static methods for throughput, latency, and ping benchmarks via the Benchmarks class
Model Parsing: Parses model archive files and configurations through parseModel() to extract handler, model URL, and runtime parameters
Single-Model Benchmarking: Executes benchmarks against a single registered model via run_single_benchmark()
Multi-Model Benchmarking: Runs benchmarks across multiple models simultaneously via run_multi_benchmark()
Docker Integration: Uses Docker containers to ensure reproducible benchmark environments
CSV Report Generation: Produces CSV reports with throughput, latency percentiles, and error metrics

Key Class: Benchmarks

The Benchmarks class (lines 93-171) contains static methods that define different benchmark profiles. Each method configures JMeter parameters for a specific type of performance test.

Usage

from benchmarks.benchmark import Benchmarks, run_single_benchmark, run_multi_benchmark, parseModel

Run from the command line:

python benchmarks/benchmark.py --model resnet-18 --url https://torchserve.pytorch.org/mar_files/resnet-18.mar

Code Reference

Source Location

File	Lines	Repository
`benchmarks/benchmark.py`	L1-510	pytorch/serve
`benchmarks/benchmark.py`	L93-171	`Benchmarks` class with static benchmark methods
`benchmarks/benchmark.py`	L173-320	`run_single_benchmark()` function
`benchmarks/benchmark.py`	L322-430	`run_multi_benchmark()` function
`benchmarks/benchmark.py`	L432-510	`parseModel()` and `main()`

Signature

class Benchmarks:
    """
    Collection of static benchmark methods for TorchServe performance testing.

    Each method configures JMeter parameters for a specific benchmark profile
    and returns the benchmark results as a dictionary.
    """

    @staticmethod
    def throughput(model_name, model_url, config):
        """
        Run a throughput-focused benchmark.

        Maximizes request rate to measure peak throughput (requests/second).

        Args:
            model_name (str): Name of the model to benchmark.
            model_url (str): URL or path to the model archive.
            config (dict): Benchmark configuration parameters.

        Returns:
            dict: Throughput metrics including req/s and error rate.
        """
        ...

    @staticmethod
    def latency(model_name, model_url, config):
        """
        Run a latency-focused benchmark.

        Measures response time distribution at controlled request rate.

        Args:
            model_name (str): Name of the model to benchmark.
            model_url (str): URL or path to the model archive.
            config (dict): Benchmark configuration parameters.

        Returns:
            dict: Latency metrics (p50, p90, p99, mean, max).
        """
        ...

    @staticmethod
    def ping(config):
        """
        Run a ping benchmark to measure health endpoint responsiveness.

        Args:
            config (dict): Benchmark configuration parameters.

        Returns:
            dict: Ping response time metrics.
        """
        ...


def run_single_benchmark(model_name, model_url, config):
    """
    Execute a complete benchmark for a single model.

    Starts TorchServe, registers the model, runs JMeter test plans,
    and collects metrics.

    Args:
        model_name (str): Name of the model.
        model_url (str): URL or path to the model archive.
        config (dict): Benchmark configuration.

    Returns:
        dict: Aggregated benchmark results.
    """
    ...


def run_multi_benchmark(models, config):
    """
    Execute benchmarks across multiple models simultaneously.

    Args:
        models (list): List of (model_name, model_url) tuples.
        config (dict): Shared benchmark configuration.

    Returns:
        list[dict]: List of benchmark results for each model.
    """
    ...


def parseModel(model_path):
    """
    Parse a model archive to extract handler and configuration details.

    Args:
        model_path (str): Path to the .mar file or model directory.

    Returns:
        tuple: (model_name, handler, model_url, extra_files).
    """
    ...

Import

from benchmarks.benchmark import Benchmarks
from benchmarks.benchmark import run_single_benchmark
from benchmarks.benchmark import run_multi_benchmark
from benchmarks.benchmark import parseModel

I/O Contract

Function / Class	Input	Output	Notes
`Benchmarks.throughput(model_name, model_url, config)`	Model name (`str`), model URL (`str`), config (`dict`)	`dict` with throughput metrics (req/s, error rate)	Static method; maximizes request rate
`Benchmarks.latency(model_name, model_url, config)`	Model name (`str`), model URL (`str`), config (`dict`)	`dict` with latency percentiles (p50, p90, p99, mean, max)	Static method; controlled request rate
`Benchmarks.ping(config)`	config (`dict`)	`dict` with ping response time metrics	Tests health endpoint responsiveness
`run_single_benchmark(model_name, model_url, config)`	Model name, URL, config	`dict` with aggregated benchmark results	Starts/stops TorchServe, registers model
`run_multi_benchmark(models, config)`	List of (model_name, model_url) tuples, config	`list[dict]` of results per model	Runs benchmarks across multiple models
`parseModel(model_path)`	Path to `.mar` or model directory	`tuple`: (model_name, handler, model_url, extra_files)	Extracts model metadata from archive

CSV Report Format

Model,Concurrency,Requests,Throughput(req/s),Latency_p50(ms),Latency_p90(ms),Latency_p99(ms),Latency_mean(ms),Error_rate(%)
resnet-18,10,1000,245.3,38.2,52.1,78.4,41.5,0.0
vgg16,20,5000,112.7,165.3,210.8,298.6,178.2,0.1

Usage Examples

Example 1: Running a single-model throughput benchmark

# Run a throughput benchmark for resnet-18
python benchmarks/benchmark.py \
    --model resnet-18 \
    --url https://torchserve.pytorch.org/mar_files/resnet-18.mar \
    --benchmark throughput \
    --concurrency 10 \
    --requests 1000

Example 2: Using the Benchmarks class programmatically

from benchmarks.benchmark import Benchmarks, run_single_benchmark

config = {
    "concurrency": 10,
    "requests": 1000,
    "batch_size": 1,
    "workers": 4,
    "docker": True,
}

# Run throughput benchmark
throughput_results = Benchmarks.throughput(
    model_name="resnet-18",
    model_url="https://torchserve.pytorch.org/mar_files/resnet-18.mar",
    config=config,
)
print(f"Throughput: {throughput_results['throughput']} req/s")

# Run latency benchmark
latency_results = Benchmarks.latency(
    model_name="resnet-18",
    model_url="https://torchserve.pytorch.org/mar_files/resnet-18.mar",
    config=config,
)
print(f"Latency p99: {latency_results['latency_p99']} ms")

Example 3: Multi-model benchmarking

from benchmarks.benchmark import run_multi_benchmark

models = [
    ("resnet-18", "https://torchserve.pytorch.org/mar_files/resnet-18.mar"),
    ("vgg16", "https://torchserve.pytorch.org/mar_files/vgg16.mar"),
    ("densenet161", "https://torchserve.pytorch.org/mar_files/densenet161.mar"),
]

config = {
    "concurrency": 20,
    "requests": 5000,
    "batch_size": 4,
    "workers": 2,
}

results = run_multi_benchmark(models, config)
for result in results:
    print(f"{result['model']}: {result['throughput']} req/s, p99={result['latency_p99']} ms")

Related Pages

Principle:Pytorch_Serve_Automated_Benchmarking -- The principle of automated performance benchmarking for TorchServe models
Implementation:Pytorch_Serve_Auto_Benchmark -- Automated benchmark orchestration that invokes this module

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment