Implementation:Huggingface Optimum Run And TimeBenchmark

Knowledge Sources	Huggingface_Optimum Optimum Docs
Domains	Benchmarking, Evaluation
Last Updated	2026-02-15 00:00 GMT

Overview

Concrete tool for orchestrating model benchmarking runs comparing transformers baselines against optimized models on latency, throughput, and evaluation metrics provided by the Huggingface Optimum library.

Description

This module provides three key classes:

Calibrator — Abstract base class for post-training calibration with static quantization. Subclasses implement the `fit()` method for specific backends.
Run — Orchestrates a benchmarking comparison between a transformers model and an optimized model. It uses Optuna grid search over batch sizes and input lengths to measure latency/throughput, collects evaluation metrics, and produces a structured results dictionary.
TimeBenchmark — Measures model inference latency and throughput by running the model forward pass for a configured duration, computing statistics (mean, std, percentiles) over collected latencies.

The module also defines the `task_processing_map` dictionary that maps task names to their corresponding TaskProcessor classes.

Usage

Subclass Run to implement backend-specific benchmarking (e.g., ONNX Runtime). The TimeBenchmark class is used internally by Run subclasses to measure inference performance. Calibrator is subclassed for backend-specific calibration logic.

Code Reference

Source Location

Repository: Huggingface_Optimum
File: optimum/runs_base.py
Lines: 1-292

Signature

class Calibrator:
    def __init__(
        self,
        calibration_dataset: "Dataset",
        quantizer,
        model_path,
        qconfig,
        calibration_params,
        node_exclusion,
    ):
        """Initialize calibrator for static quantization."""

    def fit(self):
        """Run calibration. Must be implemented by subclasses."""
        raise NotImplementedError()


class Run:
    def __init__(self, run_config: dict):
        """Initialize a benchmarking run.

        Args:
            run_config (dict): Parameters validated against RunConfig.
        """

    def launch(self) -> dict:
        """Launch the full benchmark: time + evaluation."""

    def _launch_time(self, trial):
        """Optuna objective for latency/throughput measurement."""

    def launch_eval(self):
        """Run evaluation on original and optimized models."""

    def load_datasets(self):
        """Load evaluation (and calibration) datasets."""

    def get_calibration_dataset(self) -> "Dataset":
        """Return the calibration dataset."""

    def get_eval_dataset(self) -> "Dataset":
        """Return the evaluation dataset."""

    def finalize(self):
        """Clean up intermediary files."""


class TimeBenchmark:
    def __init__(
        self,
        model,
        batch_size: int,
        input_length: int,
        model_input_names: Set[str],
        warmup_runs: int,
        duration: float,
    ):
        """Initialize time benchmark.

        Args:
            model: The model to benchmark.
            batch_size: Batch size for inputs.
            input_length: Sequence length for inputs.
            model_input_names: Set of expected input names.
            warmup_runs: Number of warmup iterations.
            duration: Benchmark duration in seconds.
        """

    def track(self):
        """Context manager to track a single forward pass latency."""

    def execute(self) -> dict:
        """Run the full benchmark and return statistics."""

    def to_dict(self) -> dict:
        """Return latency/throughput statistics as a dictionary."""

Import

from optimum.runs_base import Run, Calibrator, TimeBenchmark, task_processing_map

I/O Contract

Inputs (Run)

Name	Type	Required	Description
run_config	dict	Yes	Configuration dictionary validated against RunConfig schema

Outputs (Run.launch)

Name	Type	Description
return_body	dict	Complete benchmark results with evaluation.time and evaluation.others

Inputs (TimeBenchmark)

Name	Type	Required	Description
model	nn.Module	Yes	The model to benchmark
batch_size	int	Yes	Batch size for generated dummy inputs
input_length	int	Yes	Sequence length for generated dummy inputs
model_input_names	Set[str]	Yes	Expected model input names (e.g., input_ids, attention_mask)
warmup_runs	int	Yes	Number of warmup forward passes
duration	float	Yes	Benchmark duration in seconds

Outputs (TimeBenchmark.execute)

Name	Type	Description
stats	dict	Dictionary with nb_forwards, throughput, latency_mean, latency_std, and percentiles (50, 90, 95, 99, 999)

Usage Examples

Using TimeBenchmark

from optimum.runs_base import TimeBenchmark

# Assuming `model` is a loaded PyTorch model
benchmark = TimeBenchmark(
    model=model,
    batch_size=8,
    input_length=128,
    model_input_names={"input_ids", "attention_mask"},
    warmup_runs=10,
    duration=30.0,
)
stats = benchmark.execute()
print(f"Throughput: {stats['throughput']} samples/s")
print(f"Mean latency: {stats['latency_mean']:.2f} ms")

Related Pages

Environment:Huggingface_Optimum_Python_Core_Dependencies

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment