Implementation:Microsoft BIPIA HF Trainer For Defense
| Field | Value |
|---|---|
| Sources | Repo, Doc: HuggingFace Trainer |
| Domains | NLP, Distributed_Training, Defense |
| Last Updated | 2026-02-14 |
Overview
Concrete tool for distributed defense finetuning using HuggingFace Trainer with DeepSpeed ZeRO Stage 3 provided by the BIPIA defense module, wrapping the transformers Trainer API.
Description
The train() function in finetune.py creates a HuggingFace Trainer with the prepared model, tokenized dataset, and DataCollatorWithPaddingAndLabel (which extends DataCollatorWithPadding to handle label padding with IGNORE_TOKEN_ID). It validates model_structure == "special_token", initializes W&B logging, calls trainer.train(), then saves the model via safe_save_model_for_hf_trainer() which collects state_dict to CPU and saves. DeepSpeed config (ds_config.json) sets ZeRO Stage 3 with bf16, pin_memory, and gradient_accumulation_steps. This is a Wrapper Doc around HuggingFace's Trainer.
Usage
Run via torchrun/deepspeed:
torchrun --nproc_per_node=8 defense/white_box/finetune.py \
--model_structure special_token \
--llm_config_file config/vicuna_13b.yaml \
--deepspeed defense/white_box/ds_config.json \
...
Code Reference
- Source
- BIPIA repo
- Files
-
defense/white_box/finetune.py(L477-549, train function; L162-168, safe_save_model)defense/white_box/utils.py(L95-119, DataCollatorWithPaddingAndLabel)defense/white_box/ds_config.json(DeepSpeed config)
- Signatures
def train() -> None
Trainer(
model=model,
tokenizer=tokenizer,
args=training_args,
train_dataset=dataset,
data_collator=DataCollatorWithPaddingAndLabel(tokenizer),
)
def safe_save_model_for_hf_trainer(trainer: transformers.Trainer, output_dir: str) -> None
- Import
from transformers import Trainer
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| model | PreTrainedModel | Yes | Model with resized embeddings for special tokens |
| training_args | TrainingArguments | Yes | lr, epochs, batch_size, output_dir, deepspeed config path |
| train_dataset | Dataset | Yes | Tokenized dataset with input_ids, attention_mask, labels |
Outputs
| Output | Description |
|---|---|
| Saved model checkpoint | state_dict collected on CPU, saved at output_dir
|
| trainer_state.json | Trainer state metadata and training history |
| W&B logs | Weights & Biases experiment tracking logs |
Usage Examples
CLI invocation with key arguments:
torchrun --nproc_per_node=8 defense/white_box/finetune.py \
--model_structure special_token \
--llm_config_file config/vicuna_13b.yaml \
--deepspeed defense/white_box/ds_config.json \
--output_dir output/vicuna_13b_defense \
--num_train_epochs 3 \
--per_device_train_batch_size 2 \
--gradient_accumulation_steps 8 \
--learning_rate 2e-5 \
--bf16 True
DeepSpeed ZeRO Stage 3 config structure (ds_config.json):
{
"bf16": {
"enabled": true
},
"zero_optimization": {
"stage": 3,
"offload_optimizer": {
"device": "none"
},
"offload_param": {
"device": "none"
},
"overlap_comm": true,
"contiguous_gradients": true
},
"gradient_accumulation_steps": "auto",
"gradient_clipping": "auto",
"train_batch_size": "auto",
"train_micro_batch_size_per_gpu": "auto",
"wall_clock_breakdown": false,
"data_sampling": {
"data_efficiency": {
"enabled": false
}
}
}