Implementation:Openai Whisper Set Alignment Heads
Overview
Whisper.set_alignment_heads() decodes a compact binary representation of alignment head metadata and registers it as a sparse boolean tensor buffer on the model. This method identifies which cross-attention heads in the decoder are useful for word-level timestamp extraction via Dynamic Time Warping.
Source
- File: whisper/model.py:L278-285
- Repository: https://github.com/openai/whisper
Signature
def set_alignment_heads(self, dump: bytes) -> None:
Import
from whisper.model import Whisper
# This is a method on the Whisper class
Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
| dump | bytes | (required) | Base85-encoded, gzip-compressed boolean array marking which (layer, head) pairs are alignment heads |
Inputs and Outputs
Inputs
- dump — a bytes object containing base85-encoded, gzip-compressed boolean data from the _ALIGNMENT_HEADS dictionary (keyed by model name)
Outputs
- Sets self.alignment_heads — a sparse boolean tensor of shape (n_text_layer, n_text_head) registered as a PyTorch buffer on the model. No return value.
Behavior
- Decodes the base85 bytes using base64.b85decode()
- Decompresses the result with gzip.decompress()
- Converts the raw bytes to a numpy boolean array using np.frombuffer(..., dtype=bool)
- Reshapes the flat array to (n_text_layer, n_text_head) using the model's decoder dimensions
- Creates a sparse tensor from the boolean array using torch.from_numpy(array).to_sparse()
- Registers the sparse tensor as a persistent buffer via self.register_buffer("alignment_heads", ...)
This method is called automatically by load_model() for all official Whisper model variants. The pre-computed alignment head data is stored in the _ALIGNMENT_HEADS dictionary in whisper/__init__.py.
Example
import whisper
# Alignment heads are set automatically for official models
model = whisper.load_model("base")
# Inspect the alignment heads
dense = model.alignment_heads.to_dense()
print(dense.shape) # torch.Size([6, 8]) for the base model
print(dense) # Boolean tensor showing which heads are alignment heads
# Count alignment heads
num_alignment = model.alignment_heads.to_dense().sum().item()
print(f"Number of alignment heads: {num_alignment}")
# For custom models, you can set them manually:
# model.set_alignment_heads(alignment_data_bytes)
Notes
- The alignment head data is model-specific; each official model variant has its own set of empirically determined alignment heads
- The sparse tensor representation is memory-efficient since typically only a small fraction of all cross-attention heads are alignment heads
- The buffer is persistent, meaning it is included in state_dict() and survives model.to(device) calls
- This method is only meaningful for models that will be used for word-level timestamp extraction
Metadata
Principle:Openai_Whisper_Alignment_Head_Configuration 2025-06-25 00:00 GMT