Overview
Concrete tool for defining and validating a single benchmark scenario configuration provided by the HuggingFace Transformers benchmark framework.
Description
BenchmarkConfig is a configuration class that encapsulates all parameters for a single benchmark scenario. It accepts iteration counts, input dimensions, attention implementation, compilation settings, kernelization, and GPU monitoring flags. On construction, it performs validity checks to automatically correct incompatible parameter combinations (e.g., disabling torch.compile when Flash Attention 2 is selected in non-continuous-batching mode, or restricting compile modes for continuous batching). Each instance computes a deterministic SHA-256 hash from its serialized dictionary for deduplication and a human-readable name for identification. The class supports serialization to and from dictionaries for JSON persistence.
Usage
Use BenchmarkConfig when you need to define the parameters for a single benchmark run, validate that those parameters form a legal combination, and pass the resulting configuration to BenchmarkRunner.setup_benchmark() and BenchmarkRunner.run_benchmark().
Code Reference
Source Location
- Repository: transformers
- File:
benchmark_v2/framework/benchmark_config.py (lines 54-198)
Signature
class BenchmarkConfig:
all_attn_implementations = ["flash_attention_2", "eager", "sdpa", "flex_attention"]
all_compiled_modes = [None, "default", "reduce-overhead", "max-autotune", "max-autotune-no-cudagraphs"]
def __init__(
self,
warmup_iterations: int = 5,
measurement_iterations: int = 20,
gpu_monitoring: bool = True,
continuous_batching: bool = False,
batch_size: int = 1,
sequence_length: int = 128,
num_tokens_to_generate: int = 128,
attn_implementation: str = "eager",
compile_kwargs: dict[str, Any] | None = None,
kernelize: bool = False,
name: str | None = None,
skip_validity_check: bool = False,
) -> None:
Import
from benchmark_v2.framework.benchmark_config import BenchmarkConfig
I/O Contract
Inputs
| Name |
Type |
Required |
Description
|
| warmup_iterations |
int |
No (default: 5) |
Number of untimed warmup iterations before measurement begins.
|
| measurement_iterations |
int |
No (default: 20) |
Number of timed measurement iterations to collect.
|
| gpu_monitoring |
bool |
No (default: True) |
Whether to collect GPU utilization and memory metrics during measurement. May slow benchmarks on AMD hardware.
|
| continuous_batching |
bool |
No (default: False) |
Whether to use continuous batching (generate_batch) instead of standard generate.
|
| batch_size |
int |
No (default: 1) |
Number of sequences in the input batch.
|
| sequence_length |
int |
No (default: 128) |
Maximum input sequence length in tokens.
|
| num_tokens_to_generate |
int |
No (default: 128) |
Number of new tokens to generate per sequence.
|
| attn_implementation |
str |
No (default: "eager") |
Attention implementation to use. One of: "flash_attention_2", "eager", "sdpa", "flex_attention".
|
| compile_kwargs |
None |
No (default: None) |
Keyword arguments for CompileConfig. If None, compilation is disabled. The "fullgraph" key defaults to True if not specified.
|
| kernelize |
bool |
No (default: False) |
Whether to apply kernel-level optimizations via the kernels library.
|
| name |
None |
No (default: None) |
Human-readable name for the configuration. Auto-generated if not provided.
|
| skip_validity_check |
bool |
No (default: False) |
If True, skip all validity checks on parameter combinations.
|
Outputs
| Name |
Type |
Description
|
| (instance) |
BenchmarkConfig |
A validated configuration object with all attributes set, a computed .hash property, and a .name attribute.
|
Key Methods
| Method |
Signature |
Description
|
check_validity |
check_validity(skip_validity_check: bool = False) -> None |
Validates and auto-corrects incompatible parameter combinations. Called automatically during construction.
|
hash |
@property hash -> str |
Returns a SHA-256 hash of the serialized configuration dictionary for deduplication.
|
infer_name |
infer_name(compact: bool = True) -> str |
Generates a human-readable name from configuration parameters in compact or verbose format.
|
to_dict |
to_dict() -> dict[str, Any] |
Serializes the configuration to a dictionary suitable for JSON persistence.
|
from_dict |
@classmethod from_dict(data: dict, skip_validity_check: bool = False) -> BenchmarkConfig |
Deserializes a configuration from a dictionary.
|
Usage Examples
Basic Usage
from benchmark_v2.framework.benchmark_config import BenchmarkConfig
# Create a default benchmark configuration
config = BenchmarkConfig(
warmup_iterations=5,
measurement_iterations=20,
gpu_monitoring=True,
batch_size=1,
sequence_length=128,
num_tokens_to_generate=128,
attn_implementation="eager",
)
print(config.name) # e.g., "w5_i20-monitored-b1_s128_n128-eager-uncompiled-unkernelized-generate"
print(config.hash) # SHA-256 hex digest
Compiled Configuration
# Create a configuration with torch.compile enabled
compiled_config = BenchmarkConfig(
attn_implementation="flex_attention",
compile_kwargs={"mode": "default"},
batch_size=4,
sequence_length=256,
)
print(compiled_config.to_dict())
Serialization Round-Trip
# Serialize and deserialize
config_dict = config.to_dict()
restored_config = BenchmarkConfig.from_dict(config_dict)
assert config.hash == restored_config.hash
Related Pages
Implements Principle