Implementation:LLMBook zh LLMBook zh github io Build Alibi Tensor
| Knowledge Sources | |
|---|---|
| Domains | Deep_Learning, Model_Architecture |
| Last Updated | 2026-02-08 04:29 GMT |
Overview
Concrete tool for constructing ALiBi attention bias tensors provided as a standalone function.
Description
The build_alibi_tensor function constructs the linear bias tensor used in Attention with Linear Biases (ALiBi). It computes geometric slope values for each attention head, handles the case where the number of heads is not a power of 2, and produces the position-dependent bias matrix. The output tensor has shape `(batch * num_heads, 1, seq_length)` and is added directly to the attention scores before softmax. Closer tokens receive smaller (less negative) bias values, naturally encouraging local attention.
Usage
Import this function when implementing ALiBi-style attention in Transformer models. It is called once per forward pass (or cached) to produce the bias tensor that is added to the raw attention scores.
Code Reference
Source Location
- Repository: LLMBook-zh
- File: code/5.3 ALiBi.py
- Lines: 1-24
Signature
def build_alibi_tensor(
attention_mask: torch.Tensor,
num_heads: int,
dtype: torch.dtype
) -> torch.Tensor:
"""
Builds the ALiBi attention bias tensor.
Args:
attention_mask: Binary mask of shape (batch_size, seq_length).
num_heads: Number of attention heads.
dtype: Target dtype for the output tensor.
Returns:
ALiBi bias tensor of shape (batch_size * num_heads, 1, seq_length).
"""
Import
import torch
import math
# Function defined locally in code/5.3 ALiBi.py
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| attention_mask | torch.Tensor | Yes | Binary mask (batch_size, seq_length) |
| num_heads | int | Yes | Number of attention heads |
| dtype | torch.dtype | Yes | Target dtype for output tensor |
Outputs
| Name | Type | Description |
|---|---|---|
| alibi | torch.Tensor | Bias tensor of shape (batch_size * num_heads, 1, seq_length) |
Usage Examples
import torch
batch_size, seq_length, num_heads = 2, 128, 32
# Create attention mask (all ones = no masking)
attention_mask = torch.ones(batch_size, seq_length, dtype=torch.long)
# Build ALiBi tensor
alibi = build_alibi_tensor(attention_mask, num_heads, dtype=torch.float32)
# alibi.shape == (64, 1, 128) # batch*heads, 1, seq_len
# Add to attention scores
# attention_scores = attention_scores + alibi