Principle:Unslothai Unsloth Response Masking
| Knowledge Sources | |
|---|---|
| Domains | NLP, Training |
| Last Updated | 2026-02-07 00:00 GMT |
Overview
A training technique that masks instruction/prompt tokens in the loss computation so the model only learns to generate responses, not to reproduce the instructions themselves.
Description
In standard SFT, the cross-entropy loss is computed over all tokens in the sequence, including instruction tokens. This is wasteful because the model does not need to learn to generate instructions; it only needs to generate appropriate responses. Response masking sets the labels for instruction tokens to -100 (the PyTorch ignore index), so they contribute zero gradient.
This technique:
- Reduces noise: The model focuses gradient signal entirely on learning response generation patterns.
- Improves efficiency: Fewer tokens contribute to loss, effectively increasing the ratio of useful gradient per training step.
- Prevents instruction memorization: The model learns response distributions rather than memorizing instruction formats.
The masking boundaries are identified by delimiter tokens (e.g., <|start_header_id|>assistant for Llama 3 or <|im_start|>assistant for ChatML).
Usage
Apply this as an optional wrapper around the SFT trainer, after creating the trainer but before calling .train(). Particularly useful for instruction-following datasets where instructions are long relative to responses.
Theoretical Basis
With response masking, the loss becomes:
Failed to parse (syntax error): {\displaystyle \mathcal{L}_{masked} = -\sum_{t \in \text{response\_tokens}} \log P_\theta(y_t | y_{<t}, x) }
Only tokens after the response delimiter contribute to the loss. Instruction tokens have label = -100 and are excluded:
# Abstract response masking
for i, token in enumerate(sequence):
if token == response_delimiter:
# All tokens after this contribute to loss
labels[i:] = sequence[i:]
break
else:
labels[i] = -100 # Masked from loss