Principle:Microsoft BIPIA Training Data Tokenization
| Field | Value |
|---|---|
| Sources | BIPIA paper |
| Domains | NLP, Tokenization, Defense |
| Last Updated | 2026-02-14 |
Overview
A tokenization methodology that converts supervised defense training data into model-ready token sequences with selective label masking, ensuring the model is only trained to predict response tokens while ignoring prompt tokens.
Description
Training data tokenization converts text conversations into token ID sequences with a critical feature: label masking. The input consists of a Vicuna-format conversation:
[BOS] system_prompt USER: <data>context</data> question ASSISTANT: response [EOS]
The labels tensor mirrors input_ids but replaces all tokens before the response with IGNORE_TOKEN_ID (-100), so the loss function only backpropagates through the response generation tokens. The <data>/</data> special tokens are optionally inserted around the context to mark content boundaries. Sequences exceeding model_max_length are filtered out.
Usage
Use when preparing tokenized training batches for white-box defense finetuning. The label masking ensures the model learns to generate correct responses without memorizing prompt patterns.
Theoretical Basis
Causal language model training uses cross-entropy loss on next-token prediction. Label masking with IGNORE_TOKEN_ID (-100) excludes prompt tokens from the loss computation:
Loss = -sum(log P(t_i | t_{<i})) only for i in response_positions
This is standard practice in instruction tuning (Alpaca, Vicuna) to prevent the model from being trained to reproduce instructions.