Principle:Microsoft BIPIA Training Data Tokenization

Field	Value
Sources	BIPIA paper
Domains	NLP, Tokenization, Defense
Last Updated	2026-02-14

Overview

A tokenization methodology that converts supervised defense training data into model-ready token sequences with selective label masking, ensuring the model is only trained to predict response tokens while ignoring prompt tokens.

Description

Training data tokenization converts text conversations into token ID sequences with a critical feature: label masking. The input consists of a Vicuna-format conversation:

[BOS] system_prompt USER: <data>context</data> question ASSISTANT: response [EOS]

The labels tensor mirrors input_ids but replaces all tokens before the response with IGNORE_TOKEN_ID (-100), so the loss function only backpropagates through the response generation tokens. The <data>/</data> special tokens are optionally inserted around the context to mark content boundaries. Sequences exceeding model_max_length are filtered out.

Usage

Use when preparing tokenized training batches for white-box defense finetuning. The label masking ensures the model learns to generate correct responses without memorizing prompt patterns.

Theoretical Basis

Causal language model training uses cross-entropy loss on next-token prediction. Label masking with IGNORE_TOKEN_ID (-100) excludes prompt tokens from the loss computation:

Loss = -sum(log P(t_i | t_{<i}))   only for i in response_positions

This is standard practice in instruction tuning (Alpaca, Vicuna) to prevent the model from being trained to reproduce instructions.

Related Pages

Implementation:Microsoft_BIPIA_Tokenize_Fn

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment