Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Huggingface Transformers BenchmarkConfig

From Leeroopedia
Knowledge Sources
Domains Benchmarking, Performance, Configuration
Last Updated 2026-02-13 00:00 GMT

Overview

Concrete tool for defining and validating a single benchmark scenario configuration provided by the HuggingFace Transformers benchmark framework.

Description

BenchmarkConfig is a configuration class that encapsulates all parameters for a single benchmark scenario. It accepts iteration counts, input dimensions, attention implementation, compilation settings, kernelization, and GPU monitoring flags. On construction, it performs validity checks to automatically correct incompatible parameter combinations (e.g., disabling torch.compile when Flash Attention 2 is selected in non-continuous-batching mode, or restricting compile modes for continuous batching). Each instance computes a deterministic SHA-256 hash from its serialized dictionary for deduplication and a human-readable name for identification. The class supports serialization to and from dictionaries for JSON persistence.

Usage

Use BenchmarkConfig when you need to define the parameters for a single benchmark run, validate that those parameters form a legal combination, and pass the resulting configuration to BenchmarkRunner.setup_benchmark() and BenchmarkRunner.run_benchmark().

Code Reference

Source Location

  • Repository: transformers
  • File: benchmark_v2/framework/benchmark_config.py (lines 54-198)

Signature

class BenchmarkConfig:
    all_attn_implementations = ["flash_attention_2", "eager", "sdpa", "flex_attention"]
    all_compiled_modes = [None, "default", "reduce-overhead", "max-autotune", "max-autotune-no-cudagraphs"]

    def __init__(
        self,
        warmup_iterations: int = 5,
        measurement_iterations: int = 20,
        gpu_monitoring: bool = True,
        continuous_batching: bool = False,
        batch_size: int = 1,
        sequence_length: int = 128,
        num_tokens_to_generate: int = 128,
        attn_implementation: str = "eager",
        compile_kwargs: dict[str, Any] | None = None,
        kernelize: bool = False,
        name: str | None = None,
        skip_validity_check: bool = False,
    ) -> None:

Import

from benchmark_v2.framework.benchmark_config import BenchmarkConfig

I/O Contract

Inputs

Name Type Required Description
warmup_iterations int No (default: 5) Number of untimed warmup iterations before measurement begins.
measurement_iterations int No (default: 20) Number of timed measurement iterations to collect.
gpu_monitoring bool No (default: True) Whether to collect GPU utilization and memory metrics during measurement. May slow benchmarks on AMD hardware.
continuous_batching bool No (default: False) Whether to use continuous batching (generate_batch) instead of standard generate.
batch_size int No (default: 1) Number of sequences in the input batch.
sequence_length int No (default: 128) Maximum input sequence length in tokens.
num_tokens_to_generate int No (default: 128) Number of new tokens to generate per sequence.
attn_implementation str No (default: "eager") Attention implementation to use. One of: "flash_attention_2", "eager", "sdpa", "flex_attention".
compile_kwargs None No (default: None) Keyword arguments for CompileConfig. If None, compilation is disabled. The "fullgraph" key defaults to True if not specified.
kernelize bool No (default: False) Whether to apply kernel-level optimizations via the kernels library.
name None No (default: None) Human-readable name for the configuration. Auto-generated if not provided.
skip_validity_check bool No (default: False) If True, skip all validity checks on parameter combinations.

Outputs

Name Type Description
(instance) BenchmarkConfig A validated configuration object with all attributes set, a computed .hash property, and a .name attribute.

Key Methods

Method Signature Description
check_validity check_validity(skip_validity_check: bool = False) -> None Validates and auto-corrects incompatible parameter combinations. Called automatically during construction.
hash @property hash -> str Returns a SHA-256 hash of the serialized configuration dictionary for deduplication.
infer_name infer_name(compact: bool = True) -> str Generates a human-readable name from configuration parameters in compact or verbose format.
to_dict to_dict() -> dict[str, Any] Serializes the configuration to a dictionary suitable for JSON persistence.
from_dict @classmethod from_dict(data: dict, skip_validity_check: bool = False) -> BenchmarkConfig Deserializes a configuration from a dictionary.

Usage Examples

Basic Usage

from benchmark_v2.framework.benchmark_config import BenchmarkConfig

# Create a default benchmark configuration
config = BenchmarkConfig(
    warmup_iterations=5,
    measurement_iterations=20,
    gpu_monitoring=True,
    batch_size=1,
    sequence_length=128,
    num_tokens_to_generate=128,
    attn_implementation="eager",
)
print(config.name)   # e.g., "w5_i20-monitored-b1_s128_n128-eager-uncompiled-unkernelized-generate"
print(config.hash)   # SHA-256 hex digest

Compiled Configuration

# Create a configuration with torch.compile enabled
compiled_config = BenchmarkConfig(
    attn_implementation="flex_attention",
    compile_kwargs={"mode": "default"},
    batch_size=4,
    sequence_length=256,
)
print(compiled_config.to_dict())

Serialization Round-Trip

# Serialize and deserialize
config_dict = config.to_dict()
restored_config = BenchmarkConfig.from_dict(config_dict)
assert config.hash == restored_config.hash

Related Pages

Implements Principle

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment