Implementation:Openai Whisper Set Alignment Heads

Overview

Whisper.set_alignment_heads() decodes a compact binary representation of alignment head metadata and registers it as a sparse boolean tensor buffer on the model. This method identifies which cross-attention heads in the decoder are useful for word-level timestamp extraction via Dynamic Time Warping.

Source

File: whisper/model.py:L278-285
Repository: https://github.com/openai/whisper

Signature

def set_alignment_heads(self, dump: bytes) -> None:

Import

from whisper.model import Whisper
# This is a method on the Whisper class

Parameters

Parameter	Type	Default	Description
dump	bytes	(required)	Base85-encoded, gzip-compressed boolean array marking which (layer, head) pairs are alignment heads

Inputs and Outputs

Inputs

dump — a bytes object containing base85-encoded, gzip-compressed boolean data from the _ALIGNMENT_HEADS dictionary (keyed by model name)

Outputs

Sets self.alignment_heads — a sparse boolean tensor of shape (n_text_layer, n_text_head) registered as a PyTorch buffer on the model. No return value.

Behavior

Decodes the base85 bytes using base64.b85decode()
Decompresses the result with gzip.decompress()
Converts the raw bytes to a numpy boolean array using np.frombuffer(..., dtype=bool)
Reshapes the flat array to (n_text_layer, n_text_head) using the model's decoder dimensions
Creates a sparse tensor from the boolean array using torch.from_numpy(array).to_sparse()
Registers the sparse tensor as a persistent buffer via self.register_buffer("alignment_heads", ...)

This method is called automatically by load_model() for all official Whisper model variants. The pre-computed alignment head data is stored in the _ALIGNMENT_HEADS dictionary in whisper/__init__.py.

Example

import whisper

# Alignment heads are set automatically for official models
model = whisper.load_model("base")

# Inspect the alignment heads
dense = model.alignment_heads.to_dense()
print(dense.shape)   # torch.Size([6, 8]) for the base model
print(dense)          # Boolean tensor showing which heads are alignment heads

# Count alignment heads
num_alignment = model.alignment_heads.to_dense().sum().item()
print(f"Number of alignment heads: {num_alignment}")

# For custom models, you can set them manually:
# model.set_alignment_heads(alignment_data_bytes)

Notes

The alignment head data is model-specific; each official model variant has its own set of empirically determined alignment heads
The sparse tensor representation is memory-efficient since typically only a small fraction of all cross-attention heads are alignment heads
The buffer is persistent, meaning it is included in state_dict() and survives model.to(device) calls
This method is only meaningful for models that will be used for word-level timestamp extraction

Metadata

Principle:Openai_Whisper_Alignment_Head_Configuration 2025-06-25 00:00 GMT

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment