Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Pytorch Serve Benchmark

From Leeroopedia
Knowledge Sources
Domains Benchmarking, Performance_Testing
Last Updated 2026-02-13 18:52 GMT

Overview

Benchmark is a JMeter-based benchmarking framework for TorchServe. It provides a comprehensive suite of static methods for measuring throughput, latency, and ping performance of TorchServe endpoints. The module supports both single-model and multi-model benchmark scenarios, uses Docker for consistent environment isolation, and generates CSV-formatted reports.

Description

The benchmark.py module implements the Benchmarks class with static methods for different benchmark profiles. It orchestrates JMeter test plans against running TorchServe instances, parses model configurations, and generates structured performance reports.

Key Responsibilities

  • Benchmark Profiles: Provides static methods for throughput, latency, and ping benchmarks via the Benchmarks class
  • Model Parsing: Parses model archive files and configurations through parseModel() to extract handler, model URL, and runtime parameters
  • Single-Model Benchmarking: Executes benchmarks against a single registered model via run_single_benchmark()
  • Multi-Model Benchmarking: Runs benchmarks across multiple models simultaneously via run_multi_benchmark()
  • Docker Integration: Uses Docker containers to ensure reproducible benchmark environments
  • CSV Report Generation: Produces CSV reports with throughput, latency percentiles, and error metrics

Key Class: Benchmarks

The Benchmarks class (lines 93-171) contains static methods that define different benchmark profiles. Each method configures JMeter parameters for a specific type of performance test.

Usage

from benchmarks.benchmark import Benchmarks, run_single_benchmark, run_multi_benchmark, parseModel

Run from the command line:

python benchmarks/benchmark.py --model resnet-18 --url https://torchserve.pytorch.org/mar_files/resnet-18.mar

Code Reference

Source Location

File Lines Repository
benchmarks/benchmark.py L1-510 pytorch/serve
benchmarks/benchmark.py L93-171 Benchmarks class with static benchmark methods
benchmarks/benchmark.py L173-320 run_single_benchmark() function
benchmarks/benchmark.py L322-430 run_multi_benchmark() function
benchmarks/benchmark.py L432-510 parseModel() and main()

Signature

class Benchmarks:
    """
    Collection of static benchmark methods for TorchServe performance testing.

    Each method configures JMeter parameters for a specific benchmark profile
    and returns the benchmark results as a dictionary.
    """

    @staticmethod
    def throughput(model_name, model_url, config):
        """
        Run a throughput-focused benchmark.

        Maximizes request rate to measure peak throughput (requests/second).

        Args:
            model_name (str): Name of the model to benchmark.
            model_url (str): URL or path to the model archive.
            config (dict): Benchmark configuration parameters.

        Returns:
            dict: Throughput metrics including req/s and error rate.
        """
        ...

    @staticmethod
    def latency(model_name, model_url, config):
        """
        Run a latency-focused benchmark.

        Measures response time distribution at controlled request rate.

        Args:
            model_name (str): Name of the model to benchmark.
            model_url (str): URL or path to the model archive.
            config (dict): Benchmark configuration parameters.

        Returns:
            dict: Latency metrics (p50, p90, p99, mean, max).
        """
        ...

    @staticmethod
    def ping(config):
        """
        Run a ping benchmark to measure health endpoint responsiveness.

        Args:
            config (dict): Benchmark configuration parameters.

        Returns:
            dict: Ping response time metrics.
        """
        ...


def run_single_benchmark(model_name, model_url, config):
    """
    Execute a complete benchmark for a single model.

    Starts TorchServe, registers the model, runs JMeter test plans,
    and collects metrics.

    Args:
        model_name (str): Name of the model.
        model_url (str): URL or path to the model archive.
        config (dict): Benchmark configuration.

    Returns:
        dict: Aggregated benchmark results.
    """
    ...


def run_multi_benchmark(models, config):
    """
    Execute benchmarks across multiple models simultaneously.

    Args:
        models (list): List of (model_name, model_url) tuples.
        config (dict): Shared benchmark configuration.

    Returns:
        list[dict]: List of benchmark results for each model.
    """
    ...


def parseModel(model_path):
    """
    Parse a model archive to extract handler and configuration details.

    Args:
        model_path (str): Path to the .mar file or model directory.

    Returns:
        tuple: (model_name, handler, model_url, extra_files).
    """
    ...

Import

from benchmarks.benchmark import Benchmarks
from benchmarks.benchmark import run_single_benchmark
from benchmarks.benchmark import run_multi_benchmark
from benchmarks.benchmark import parseModel

I/O Contract

Function / Class Input Output Notes
Benchmarks.throughput(model_name, model_url, config) Model name (str), model URL (str), config (dict) dict with throughput metrics (req/s, error rate) Static method; maximizes request rate
Benchmarks.latency(model_name, model_url, config) Model name (str), model URL (str), config (dict) dict with latency percentiles (p50, p90, p99, mean, max) Static method; controlled request rate
Benchmarks.ping(config) config (dict) dict with ping response time metrics Tests health endpoint responsiveness
run_single_benchmark(model_name, model_url, config) Model name, URL, config dict with aggregated benchmark results Starts/stops TorchServe, registers model
run_multi_benchmark(models, config) List of (model_name, model_url) tuples, config list[dict] of results per model Runs benchmarks across multiple models
parseModel(model_path) Path to .mar or model directory tuple: (model_name, handler, model_url, extra_files) Extracts model metadata from archive

CSV Report Format

Model,Concurrency,Requests,Throughput(req/s),Latency_p50(ms),Latency_p90(ms),Latency_p99(ms),Latency_mean(ms),Error_rate(%)
resnet-18,10,1000,245.3,38.2,52.1,78.4,41.5,0.0
vgg16,20,5000,112.7,165.3,210.8,298.6,178.2,0.1

Usage Examples

Example 1: Running a single-model throughput benchmark

# Run a throughput benchmark for resnet-18
python benchmarks/benchmark.py \
    --model resnet-18 \
    --url https://torchserve.pytorch.org/mar_files/resnet-18.mar \
    --benchmark throughput \
    --concurrency 10 \
    --requests 1000

Example 2: Using the Benchmarks class programmatically

from benchmarks.benchmark import Benchmarks, run_single_benchmark

config = {
    "concurrency": 10,
    "requests": 1000,
    "batch_size": 1,
    "workers": 4,
    "docker": True,
}

# Run throughput benchmark
throughput_results = Benchmarks.throughput(
    model_name="resnet-18",
    model_url="https://torchserve.pytorch.org/mar_files/resnet-18.mar",
    config=config,
)
print(f"Throughput: {throughput_results['throughput']} req/s")

# Run latency benchmark
latency_results = Benchmarks.latency(
    model_name="resnet-18",
    model_url="https://torchserve.pytorch.org/mar_files/resnet-18.mar",
    config=config,
)
print(f"Latency p99: {latency_results['latency_p99']} ms")

Example 3: Multi-model benchmarking

from benchmarks.benchmark import run_multi_benchmark

models = [
    ("resnet-18", "https://torchserve.pytorch.org/mar_files/resnet-18.mar"),
    ("vgg16", "https://torchserve.pytorch.org/mar_files/vgg16.mar"),
    ("densenet161", "https://torchserve.pytorch.org/mar_files/densenet161.mar"),
]

config = {
    "concurrency": 20,
    "requests": 5000,
    "batch_size": 4,
    "workers": 2,
}

results = run_multi_benchmark(models, config)
for result in results:
    print(f"{result['model']}: {result['throughput']} req/s, p99={result['latency_p99']} ms")

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment