Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:LLMBook zh LLMBook zh github io Build Alibi Tensor

From Leeroopedia


Knowledge Sources
Domains Deep_Learning, Model_Architecture
Last Updated 2026-02-08 04:29 GMT

Overview

Concrete tool for constructing ALiBi attention bias tensors provided as a standalone function.

Description

The build_alibi_tensor function constructs the linear bias tensor used in Attention with Linear Biases (ALiBi). It computes geometric slope values for each attention head, handles the case where the number of heads is not a power of 2, and produces the position-dependent bias matrix. The output tensor has shape `(batch * num_heads, 1, seq_length)` and is added directly to the attention scores before softmax. Closer tokens receive smaller (less negative) bias values, naturally encouraging local attention.

Usage

Import this function when implementing ALiBi-style attention in Transformer models. It is called once per forward pass (or cached) to produce the bias tensor that is added to the raw attention scores.

Code Reference

Source Location

  • Repository: LLMBook-zh
  • File: code/5.3 ALiBi.py
  • Lines: 1-24

Signature

def build_alibi_tensor(
    attention_mask: torch.Tensor,
    num_heads: int,
    dtype: torch.dtype
) -> torch.Tensor:
    """
    Builds the ALiBi attention bias tensor.

    Args:
        attention_mask: Binary mask of shape (batch_size, seq_length).
        num_heads: Number of attention heads.
        dtype: Target dtype for the output tensor.
    Returns:
        ALiBi bias tensor of shape (batch_size * num_heads, 1, seq_length).
    """

Import

import torch
import math
# Function defined locally in code/5.3 ALiBi.py

I/O Contract

Inputs

Name Type Required Description
attention_mask torch.Tensor Yes Binary mask (batch_size, seq_length)
num_heads int Yes Number of attention heads
dtype torch.dtype Yes Target dtype for output tensor

Outputs

Name Type Description
alibi torch.Tensor Bias tensor of shape (batch_size * num_heads, 1, seq_length)

Usage Examples

import torch

batch_size, seq_length, num_heads = 2, 128, 32

# Create attention mask (all ones = no masking)
attention_mask = torch.ones(batch_size, seq_length, dtype=torch.long)

# Build ALiBi tensor
alibi = build_alibi_tensor(attention_mask, num_heads, dtype=torch.float32)
# alibi.shape == (64, 1, 128)  # batch*heads, 1, seq_len

# Add to attention scores
# attention_scores = attention_scores + alibi

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment