Implementation:LLMBook zh LLMBook zh github io Build Alibi Tensor

Knowledge Sources	LLMBook-zh Train Short, Test Long: ALiBi
Domains	Deep_Learning, Model_Architecture
Last Updated	2026-02-08 04:29 GMT

Overview

Concrete tool for constructing ALiBi attention bias tensors provided as a standalone function.

Description

The build_alibi_tensor function constructs the linear bias tensor used in Attention with Linear Biases (ALiBi). It computes geometric slope values for each attention head, handles the case where the number of heads is not a power of 2, and produces the position-dependent bias matrix. The output tensor has shape `(batch * num_heads, 1, seq_length)` and is added directly to the attention scores before softmax. Closer tokens receive smaller (less negative) bias values, naturally encouraging local attention.

Usage

Import this function when implementing ALiBi-style attention in Transformer models. It is called once per forward pass (or cached) to produce the bias tensor that is added to the raw attention scores.

Code Reference

Source Location

Repository: LLMBook-zh
File: code/5.3 ALiBi.py
Lines: 1-24

Signature

def build_alibi_tensor(
    attention_mask: torch.Tensor,
    num_heads: int,
    dtype: torch.dtype
) -> torch.Tensor:
    """
    Builds the ALiBi attention bias tensor.

    Args:
        attention_mask: Binary mask of shape (batch_size, seq_length).
        num_heads: Number of attention heads.
        dtype: Target dtype for the output tensor.
    Returns:
        ALiBi bias tensor of shape (batch_size * num_heads, 1, seq_length).
    """

Import

import torch
import math
# Function defined locally in code/5.3 ALiBi.py

I/O Contract

Inputs

Name	Type	Required	Description
attention_mask	torch.Tensor	Yes	Binary mask (batch_size, seq_length)
num_heads	int	Yes	Number of attention heads
dtype	torch.dtype	Yes	Target dtype for output tensor

Outputs

Name	Type	Description
alibi	torch.Tensor	Bias tensor of shape (batch_size * num_heads, 1, seq_length)

Usage Examples

import torch

batch_size, seq_length, num_heads = 2, 128, 32

# Create attention mask (all ones = no masking)
attention_mask = torch.ones(batch_size, seq_length, dtype=torch.long)

# Build ALiBi tensor
alibi = build_alibi_tensor(attention_mask, num_heads, dtype=torch.float32)
# alibi.shape == (64, 1, 128)  # batch*heads, 1, seq_len

# Add to attention scores
# attention_scores = attention_scores + alibi

Related Pages

Environment:LLMBook_zh_LLMBook_zh_github_io_PyTorch_CUDA_GPU_Environment

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment