Implementation:Microsoft LoRA Legacy Run Squad
Template:Implementation metadata
Overview
Fine-tuning script for extractive question answering on SQuAD 1.1 and SQuAD 2.0 supporting BERT, DistilBERT, XLM, XLNet, and other QA-capable transformer architectures.
Description
run_squad.py is a legacy HuggingFace Transformers example script included in the Microsoft LoRA NLU example directory. It provides a complete training and evaluation pipeline for extractive question answering on the Stanford Question Answering Dataset (SQuAD). The script uses AutoModelForQuestionAnswering with AutoConfig and AutoTokenizer for model-agnostic initialization, and relies on squad_convert_examples_to_features for data preprocessing with distributed-safe caching.
The training loop implements AdamW optimization with linear warmup scheduling, gradient accumulation, mixed-precision training (NVIDIA Apex), multi-GPU via DataParallel, and distributed training via DistributedDataParallel. Evaluation handles both simple models (BERT/DistilBERT producing start/end logits) and complex models (XLNet/XLM producing start/end top-k indices with cls_logits) through separate post-processing paths using compute_predictions_logits and compute_predictions_log_probs. SQuAD 2.0 support includes null answer detection via --version_2_with_negative.
This script is part of the HuggingFace Transformers library (legacy examples) bundled in the Microsoft LoRA repository.
⚠️ DEPRECATED: This file resides in the legacy/ directory and is not actively maintained. Prefer modern equivalents where available.
Usage
Use this script to fine-tune a pretrained transformer model for extractive question answering on SQuAD 1.1 or SQuAD 2.0 datasets, or to evaluate previously fine-tuned checkpoints. It supports resuming training from checkpoints, evaluation of all saved checkpoints, and TensorBoard logging.
Code Reference
Source Location
| Property | Value |
|---|---|
| File path | examples/NLU/examples/legacy/question-answering/run_squad.py
|
| Lines | 830 |
| Module | run_squad
|
Key Functions
| Name | Signature | Description |
|---|---|---|
train |
train(args, train_dataset, model, tokenizer) |
Full training loop with DDP, gradient accumulation, TensorBoard, checkpointing; returns (global_step, avg_loss)
|
evaluate |
evaluate(args, model, tokenizer, prefix="") |
Runs evaluation, computes predictions, returns F1/exact match dict |
load_and_cache_examples |
load_and_cache_examples(args, tokenizer, evaluate=False, output_examples=False) |
Loads SQuAD data with caching; uses SquadV1Processor or SquadV2Processor
|
set_seed |
set_seed(args) |
Seeds random, numpy, and torch RNGs |
main |
main() |
Entry point: parses args, initializes model, runs train/eval pipeline |
CLI Usage
python run_squad.py \ --model_type bert \ --model_name_or_path bert-base-uncased \ --do_train \ --do_eval \ --data_dir /path/to/squad \ --train_file train-v1.1.json \ --predict_file dev-v1.1.json \ --output_dir /path/to/output \ --per_gpu_train_batch_size 8 \ --learning_rate 3e-5 \ --num_train_epochs 2.0 \ --max_seq_length 384 \ --doc_stride 128
I/O Contract
Inputs
| Input | Type | Description |
|---|---|---|
--model_type |
str (required) |
Model architecture type (e.g., bert, xlnet, distilbert, xlm)
|
--model_name_or_path |
str (required) |
Pretrained model name or path |
--output_dir |
str (required) |
Directory for checkpoints and predictions |
--data_dir |
str |
Directory containing SQuAD JSON files |
--train_file |
str |
Training file name (e.g., train-v1.1.json)
|
--predict_file |
str |
Evaluation file name (e.g., dev-v1.1.json)
|
--version_2_with_negative |
flag | Enable SQuAD 2.0 mode with unanswerable questions |
--max_seq_length |
int (default 384) |
Maximum input sequence length after tokenization |
--doc_stride |
int (default 128) |
Stride for splitting long documents into chunks |
--max_query_length |
int (default 64) |
Maximum number of tokens for the question |
--per_gpu_train_batch_size |
int (default 8) |
Batch size per GPU for training |
--learning_rate |
float (default 5e-5) |
Initial learning rate for AdamW |
--num_train_epochs |
float (default 3.0) |
Total training epochs |
Outputs
| Output | Type | Description |
|---|---|---|
predictions_{prefix}.json |
JSON | Best predicted answer span for each question |
nbest_predictions_{prefix}.json |
JSON | Top-N predicted answer spans with probabilities |
null_odds_{prefix}.json |
JSON | Null answer odds (SQuAD 2.0 only) |
checkpoint-{step}/ |
directory | Saved model, tokenizer, optimizer, scheduler states |
training_args.bin |
binary | Serialized training arguments |
| Return value | Dict[str, float] |
Evaluation results with F1 and exact match scores |
Usage Examples
Fine-tuning BERT on SQuAD 1.1
python run_squad.py \ --model_type bert \ --model_name_or_path bert-base-uncased \ --do_train \ --do_eval \ --data_dir /data/squad/ \ --train_file train-v1.1.json \ --predict_file dev-v1.1.json \ --output_dir /output/squad_bert/ \ --per_gpu_train_batch_size 12 \ --learning_rate 3e-5 \ --num_train_epochs 2.0 \ --max_seq_length 384 \ --doc_stride 128 \ --overwrite_output_dir
Fine-tuning XLNet on SQuAD 2.0
python run_squad.py \ --model_type xlnet \ --model_name_or_path xlnet-large-cased \ --do_train \ --do_eval \ --version_2_with_negative \ --data_dir /data/squad/ \ --train_file train-v2.0.json \ --predict_file dev-v2.0.json \ --output_dir /output/squad2_xlnet/ \ --per_gpu_train_batch_size 4 \ --learning_rate 3e-5 \ --num_train_epochs 4.0 \ --max_seq_length 384 \ --doc_stride 128 \ --overwrite_output_dir
Evaluating All Checkpoints
python run_squad.py \ --model_type bert \ --model_name_or_path /output/squad_bert/ \ --do_eval \ --eval_all_checkpoints \ --data_dir /data/squad/ \ --predict_file dev-v1.1.json \ --output_dir /output/squad_bert/ \ --max_seq_length 384