Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Openai Whisper Set Alignment Heads

From Leeroopedia

Overview

Whisper.set_alignment_heads() decodes a compact binary representation of alignment head metadata and registers it as a sparse boolean tensor buffer on the model. This method identifies which cross-attention heads in the decoder are useful for word-level timestamp extraction via Dynamic Time Warping.

Source

Signature

def set_alignment_heads(self, dump: bytes) -> None:

Import

from whisper.model import Whisper
# This is a method on the Whisper class

Parameters

Parameter Type Default Description
dump bytes (required) Base85-encoded, gzip-compressed boolean array marking which (layer, head) pairs are alignment heads

Inputs and Outputs

Inputs

  • dump — a bytes object containing base85-encoded, gzip-compressed boolean data from the _ALIGNMENT_HEADS dictionary (keyed by model name)

Outputs

  • Sets self.alignment_heads — a sparse boolean tensor of shape (n_text_layer, n_text_head) registered as a PyTorch buffer on the model. No return value.

Behavior

  1. Decodes the base85 bytes using base64.b85decode()
  2. Decompresses the result with gzip.decompress()
  3. Converts the raw bytes to a numpy boolean array using np.frombuffer(..., dtype=bool)
  4. Reshapes the flat array to (n_text_layer, n_text_head) using the model's decoder dimensions
  5. Creates a sparse tensor from the boolean array using torch.from_numpy(array).to_sparse()
  6. Registers the sparse tensor as a persistent buffer via self.register_buffer("alignment_heads", ...)

This method is called automatically by load_model() for all official Whisper model variants. The pre-computed alignment head data is stored in the _ALIGNMENT_HEADS dictionary in whisper/__init__.py.

Example

import whisper

# Alignment heads are set automatically for official models
model = whisper.load_model("base")

# Inspect the alignment heads
dense = model.alignment_heads.to_dense()
print(dense.shape)   # torch.Size([6, 8]) for the base model
print(dense)          # Boolean tensor showing which heads are alignment heads

# Count alignment heads
num_alignment = model.alignment_heads.to_dense().sum().item()
print(f"Number of alignment heads: {num_alignment}")

# For custom models, you can set them manually:
# model.set_alignment_heads(alignment_data_bytes)

Notes

  • The alignment head data is model-specific; each official model variant has its own set of empirically determined alignment heads
  • The sparse tensor representation is memory-efficient since typically only a small fraction of all cross-attention heads are alignment heads
  • The buffer is persistent, meaning it is included in state_dict() and survives model.to(device) calls
  • This method is only meaningful for models that will be used for word-level timestamp extraction

Metadata

Principle:Openai_Whisper_Alignment_Head_Configuration 2025-06-25 00:00 GMT

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment