Principle:LLMBook zh LLMBook zh github io Loss Masking Tokenization

Knowledge Sources	Training language models to follow instructions with human feedback LLMBook-zh
Domains	NLP, Training
Last Updated	2026-02-08 00:00 GMT

Overview

A selective loss computation technique that masks prompt tokens during SFT so the model only learns to generate the response portion.

Description

Loss Masking Tokenization addresses the problem of training signal dilution in supervised fine-tuning. When training on instruction-response pairs, computing loss on the entire sequence (including the prompt) wastes capacity on learning to predict the instruction text itself. By setting the labels for prompt tokens to IGNORE_INDEX (-100), the cross-entropy loss function ignores these positions, focusing learning entirely on generating the correct response.

Usage

Use this principle whenever doing supervised fine-tuning with instruction-response data. It is standard practice in all instruction-tuning pipelines to mask the instruction portion of the labels.

Theoretical Basis

Given a concatenated sequence [prompt, response]:

Tokenize the full sequence to get input_ids.
Create labels as a copy of input_ids.
Set labels[:len(prompt_tokens)] = IGNORE_INDEX (-100).
The cross-entropy loss function in PyTorch ignores positions with label -100.

Pseudo-code:

# Abstract algorithm (NOT real implementation)
source_ids = tokenizer.encode(prompt)
full_ids = tokenizer.encode(prompt + response + eos)
labels = full_ids.clone()
labels[:len(source_ids)] = -100  # Mask prompt tokens
# Loss is only computed on response tokens

Related Pages

Implemented By

Implementation:LLMBook_zh_LLMBook_zh_github_io_SFTDataset_Encode_Src_Tgt

Uses Heuristic

Heuristic:LLMBook_zh_LLMBook_zh_github_io_IGNORE_INDEX_Loss_Masking

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment