Implementation:Microsoft BIPIA Load Bipia Supervised Data Module
Overview
Concrete tool for constructing supervised finetuning datasets from poisoned prompts and correct responses provided by the BIPIA defense module.
Description
load_bipia_supervised_data_module() iterates over configured task names ("all" or specific tasks joined by "+"), builds PIA datasets using AutoPIABuilder, determines response targets based on response_strategy, constructs conversation tuples (user_prompt, response), and applies tokenization with label masking. It supports concatenating datasets from multiple task types and creates input_ids with IGNORE_TOKEN_ID masking for all tokens before the response.
Usage
Called internally during white-box defense finetuning with tokenizer and data_args.
Code Reference
| Field | Value |
|---|---|
| Source | BIPIA repo |
| File | defense/white_box/finetune.py |
| Lines | L261-474 |
| Signature | def load_bipia_supervised_data_module(tokenizer: transformers.PreTrainedTokenizer, data_args) -> Dict
|
| Import | Internal function in finetune.py |
I/O Contract
Inputs
| Parameter | Type | Required | Description |
|---|---|---|---|
| tokenizer | transformers.PreTrainedTokenizer | Yes | Tokenizer for the target model |
| data_args | DataArguments | Yes | Configuration dataclass containing all data parameters |
DataArguments fields:
| Field | Description |
|---|---|
| dataset_name | Task type(s): "all" or specific tasks joined by "+" (e.g., "qa+email+code") |
| response_strategy | One of "original", "self_clean", "gpt4_clean" |
| context_data_file | Path to context data |
| attack_data_file | Path to attack data |
| response_data_file | Path to clean response data (for self_clean/gpt4_clean strategies) |
| bipia_seed | Random seed for dataset construction |
| add_ign_guidance | Whether to prepend ignore-attack guidance text to prompts |
Outputs
datasets.Dataset with the following columns:
| Column | Description |
|---|---|
| input_ids | Tokenized input sequence (prompt + response concatenated) |
| attention_mask | Standard attention mask (1 for real tokens, 0 for padding) |
| labels | Token IDs for loss computation, with IGNORE_TOKEN_ID (-100) masking all tokens before the response start |
Usage Examples
Function call during finetuning:
from defense.white_box.finetune import load_bipia_supervised_data_module
data_module = load_bipia_supervised_data_module(
tokenizer=tokenizer,
data_args=data_args
)
# data_module is a dict containing the Dataset
# Used directly with HuggingFace Trainer
trainer = Trainer(
model=model,
train_dataset=data_module["train_dataset"],
...
)
DataArguments configuration example:
@dataclass
class DataArguments:
dataset_name: str = "all" # all 5 task types
response_strategy: str = "self_clean" # use model's own clean responses
context_data_file: str = "data/context/"
attack_data_file: str = "data/attacks/"
response_data_file: str = "output/clean_responses/"
bipia_seed: int = 42
add_ign_guidance: bool = False