Implementation:Huggingface Transformers BenchmarkConfig

Knowledge Sources	Transformers
Domains	Benchmarking, Performance, Configuration
Last Updated	2026-02-13 00:00 GMT

Overview

Concrete tool for defining and validating a single benchmark scenario configuration provided by the HuggingFace Transformers benchmark framework.

Description

BenchmarkConfig is a configuration class that encapsulates all parameters for a single benchmark scenario. It accepts iteration counts, input dimensions, attention implementation, compilation settings, kernelization, and GPU monitoring flags. On construction, it performs validity checks to automatically correct incompatible parameter combinations (e.g., disabling torch.compile when Flash Attention 2 is selected in non-continuous-batching mode, or restricting compile modes for continuous batching). Each instance computes a deterministic SHA-256 hash from its serialized dictionary for deduplication and a human-readable name for identification. The class supports serialization to and from dictionaries for JSON persistence.

Usage

Use BenchmarkConfig when you need to define the parameters for a single benchmark run, validate that those parameters form a legal combination, and pass the resulting configuration to BenchmarkRunner.setup_benchmark() and BenchmarkRunner.run_benchmark().

Code Reference

Source Location

Repository: transformers
File: benchmark_v2/framework/benchmark_config.py (lines 54-198)

Signature

class BenchmarkConfig:
    all_attn_implementations = ["flash_attention_2", "eager", "sdpa", "flex_attention"]
    all_compiled_modes = [None, "default", "reduce-overhead", "max-autotune", "max-autotune-no-cudagraphs"]

    def __init__(
        self,
        warmup_iterations: int = 5,
        measurement_iterations: int = 20,
        gpu_monitoring: bool = True,
        continuous_batching: bool = False,
        batch_size: int = 1,
        sequence_length: int = 128,
        num_tokens_to_generate: int = 128,
        attn_implementation: str = "eager",
        compile_kwargs: dict[str, Any] | None = None,
        kernelize: bool = False,
        name: str | None = None,
        skip_validity_check: bool = False,
    ) -> None:

Import

from benchmark_v2.framework.benchmark_config import BenchmarkConfig

I/O Contract

Inputs

Name	Type	Required	Description
warmup_iterations	`int`	No (default: 5)	Number of untimed warmup iterations before measurement begins.
measurement_iterations	`int`	No (default: 20)	Number of timed measurement iterations to collect.
gpu_monitoring	`bool`	No (default: True)	Whether to collect GPU utilization and memory metrics during measurement. May slow benchmarks on AMD hardware.
continuous_batching	`bool`	No (default: False)	Whether to use continuous batching (`generate_batch`) instead of standard `generate`.
batch_size	`int`	No (default: 1)	Number of sequences in the input batch.
sequence_length	`int`	No (default: 128)	Maximum input sequence length in tokens.
num_tokens_to_generate	`int`	No (default: 128)	Number of new tokens to generate per sequence.
attn_implementation	`str`	No (default: "eager")	Attention implementation to use. One of: `"flash_attention_2"`, `"eager"`, `"sdpa"`, `"flex_attention"`.
compile_kwargs	None	No (default: None)	Keyword arguments for `CompileConfig`. If `None`, compilation is disabled. The `"fullgraph"` key defaults to `True` if not specified.
kernelize	`bool`	No (default: False)	Whether to apply kernel-level optimizations via the `kernels` library.
name	None	No (default: None)	Human-readable name for the configuration. Auto-generated if not provided.
skip_validity_check	`bool`	No (default: False)	If `True`, skip all validity checks on parameter combinations.

Outputs

Name	Type	Description
(instance)	`BenchmarkConfig`	A validated configuration object with all attributes set, a computed `.hash` property, and a `.name` attribute.

Key Methods

Method	Signature	Description
`check_validity`	`check_validity(skip_validity_check: bool = False) -> None`	Validates and auto-corrects incompatible parameter combinations. Called automatically during construction.
`hash`	`@property hash -> str`	Returns a SHA-256 hash of the serialized configuration dictionary for deduplication.
`infer_name`	`infer_name(compact: bool = True) -> str`	Generates a human-readable name from configuration parameters in compact or verbose format.
`to_dict`	`to_dict() -> dict[str, Any]`	Serializes the configuration to a dictionary suitable for JSON persistence.
`from_dict`	`@classmethod from_dict(data: dict, skip_validity_check: bool = False) -> BenchmarkConfig`	Deserializes a configuration from a dictionary.

Usage Examples

Basic Usage

from benchmark_v2.framework.benchmark_config import BenchmarkConfig

# Create a default benchmark configuration
config = BenchmarkConfig(
    warmup_iterations=5,
    measurement_iterations=20,
    gpu_monitoring=True,
    batch_size=1,
    sequence_length=128,
    num_tokens_to_generate=128,
    attn_implementation="eager",
)
print(config.name)   # e.g., "w5_i20-monitored-b1_s128_n128-eager-uncompiled-unkernelized-generate"
print(config.hash)   # SHA-256 hex digest

Compiled Configuration

# Create a configuration with torch.compile enabled
compiled_config = BenchmarkConfig(
    attn_implementation="flex_attention",
    compile_kwargs={"mode": "default"},
    batch_size=4,
    sequence_length=256,
)
print(compiled_config.to_dict())

Serialization Round-Trip

# Serialize and deserialize
config_dict = config.to_dict()
restored_config = BenchmarkConfig.from_dict(config_dict)
assert config.hash == restored_config.hash

Related Pages

Implements Principle

Principle:Huggingface_Transformers_Benchmark_Configuration

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment