Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Huggingface Optimum Run And TimeBenchmark

From Leeroopedia
Knowledge Sources
Domains Benchmarking, Evaluation
Last Updated 2026-02-15 00:00 GMT

Overview

Concrete tool for orchestrating model benchmarking runs comparing transformers baselines against optimized models on latency, throughput, and evaluation metrics provided by the Huggingface Optimum library.

Description

This module provides three key classes:

  • Calibrator — Abstract base class for post-training calibration with static quantization. Subclasses implement the `fit()` method for specific backends.
  • Run — Orchestrates a benchmarking comparison between a transformers model and an optimized model. It uses Optuna grid search over batch sizes and input lengths to measure latency/throughput, collects evaluation metrics, and produces a structured results dictionary.
  • TimeBenchmark — Measures model inference latency and throughput by running the model forward pass for a configured duration, computing statistics (mean, std, percentiles) over collected latencies.

The module also defines the `task_processing_map` dictionary that maps task names to their corresponding TaskProcessor classes.

Usage

Subclass Run to implement backend-specific benchmarking (e.g., ONNX Runtime). The TimeBenchmark class is used internally by Run subclasses to measure inference performance. Calibrator is subclassed for backend-specific calibration logic.

Code Reference

Source Location

Signature

class Calibrator:
    def __init__(
        self,
        calibration_dataset: "Dataset",
        quantizer,
        model_path,
        qconfig,
        calibration_params,
        node_exclusion,
    ):
        """Initialize calibrator for static quantization."""

    def fit(self):
        """Run calibration. Must be implemented by subclasses."""
        raise NotImplementedError()


class Run:
    def __init__(self, run_config: dict):
        """Initialize a benchmarking run.

        Args:
            run_config (dict): Parameters validated against RunConfig.
        """

    def launch(self) -> dict:
        """Launch the full benchmark: time + evaluation."""

    def _launch_time(self, trial):
        """Optuna objective for latency/throughput measurement."""

    def launch_eval(self):
        """Run evaluation on original and optimized models."""

    def load_datasets(self):
        """Load evaluation (and calibration) datasets."""

    def get_calibration_dataset(self) -> "Dataset":
        """Return the calibration dataset."""

    def get_eval_dataset(self) -> "Dataset":
        """Return the evaluation dataset."""

    def finalize(self):
        """Clean up intermediary files."""


class TimeBenchmark:
    def __init__(
        self,
        model,
        batch_size: int,
        input_length: int,
        model_input_names: Set[str],
        warmup_runs: int,
        duration: float,
    ):
        """Initialize time benchmark.

        Args:
            model: The model to benchmark.
            batch_size: Batch size for inputs.
            input_length: Sequence length for inputs.
            model_input_names: Set of expected input names.
            warmup_runs: Number of warmup iterations.
            duration: Benchmark duration in seconds.
        """

    def track(self):
        """Context manager to track a single forward pass latency."""

    def execute(self) -> dict:
        """Run the full benchmark and return statistics."""

    def to_dict(self) -> dict:
        """Return latency/throughput statistics as a dictionary."""

Import

from optimum.runs_base import Run, Calibrator, TimeBenchmark, task_processing_map

I/O Contract

Inputs (Run)

Name Type Required Description
run_config dict Yes Configuration dictionary validated against RunConfig schema

Outputs (Run.launch)

Name Type Description
return_body dict Complete benchmark results with evaluation.time and evaluation.others

Inputs (TimeBenchmark)

Name Type Required Description
model nn.Module Yes The model to benchmark
batch_size int Yes Batch size for generated dummy inputs
input_length int Yes Sequence length for generated dummy inputs
model_input_names Set[str] Yes Expected model input names (e.g., input_ids, attention_mask)
warmup_runs int Yes Number of warmup forward passes
duration float Yes Benchmark duration in seconds

Outputs (TimeBenchmark.execute)

Name Type Description
stats dict Dictionary with nb_forwards, throughput, latency_mean, latency_std, and percentiles (50, 90, 95, 99, 999)

Usage Examples

Using TimeBenchmark

from optimum.runs_base import TimeBenchmark

# Assuming `model` is a loaded PyTorch model
benchmark = TimeBenchmark(
    model=model,
    batch_size=8,
    input_length=128,
    model_input_names={"input_ids", "attention_mask"},
    warmup_runs=10,
    duration=30.0,
)
stats = benchmark.execute()
print(f"Throughput: {stats['throughput']} samples/s")
print(f"Mean latency: {stats['latency_mean']:.2f} ms")

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment