Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Microsoft LoRA Legacy Run Squad

From Leeroopedia


Template:Implementation metadata

Overview

Fine-tuning script for extractive question answering on SQuAD 1.1 and SQuAD 2.0 supporting BERT, DistilBERT, XLM, XLNet, and other QA-capable transformer architectures.

Description

run_squad.py is a legacy HuggingFace Transformers example script included in the Microsoft LoRA NLU example directory. It provides a complete training and evaluation pipeline for extractive question answering on the Stanford Question Answering Dataset (SQuAD). The script uses AutoModelForQuestionAnswering with AutoConfig and AutoTokenizer for model-agnostic initialization, and relies on squad_convert_examples_to_features for data preprocessing with distributed-safe caching.

The training loop implements AdamW optimization with linear warmup scheduling, gradient accumulation, mixed-precision training (NVIDIA Apex), multi-GPU via DataParallel, and distributed training via DistributedDataParallel. Evaluation handles both simple models (BERT/DistilBERT producing start/end logits) and complex models (XLNet/XLM producing start/end top-k indices with cls_logits) through separate post-processing paths using compute_predictions_logits and compute_predictions_log_probs. SQuAD 2.0 support includes null answer detection via --version_2_with_negative.

This script is part of the HuggingFace Transformers library (legacy examples) bundled in the Microsoft LoRA repository.

⚠️ DEPRECATED: This file resides in the legacy/ directory and is not actively maintained. Prefer modern equivalents where available.

Usage

Use this script to fine-tune a pretrained transformer model for extractive question answering on SQuAD 1.1 or SQuAD 2.0 datasets, or to evaluate previously fine-tuned checkpoints. It supports resuming training from checkpoints, evaluation of all saved checkpoints, and TensorBoard logging.

Code Reference

Source Location

Property Value
File path examples/NLU/examples/legacy/question-answering/run_squad.py
Lines 830
Module run_squad

Key Functions

Name Signature Description
train train(args, train_dataset, model, tokenizer) Full training loop with DDP, gradient accumulation, TensorBoard, checkpointing; returns (global_step, avg_loss)
evaluate evaluate(args, model, tokenizer, prefix="") Runs evaluation, computes predictions, returns F1/exact match dict
load_and_cache_examples load_and_cache_examples(args, tokenizer, evaluate=False, output_examples=False) Loads SQuAD data with caching; uses SquadV1Processor or SquadV2Processor
set_seed set_seed(args) Seeds random, numpy, and torch RNGs
main main() Entry point: parses args, initializes model, runs train/eval pipeline

CLI Usage

python run_squad.py \
  --model_type bert \
  --model_name_or_path bert-base-uncased \
  --do_train \
  --do_eval \
  --data_dir /path/to/squad \
  --train_file train-v1.1.json \
  --predict_file dev-v1.1.json \
  --output_dir /path/to/output \
  --per_gpu_train_batch_size 8 \
  --learning_rate 3e-5 \
  --num_train_epochs 2.0 \
  --max_seq_length 384 \
  --doc_stride 128

I/O Contract

Inputs

Input Type Description
--model_type str (required) Model architecture type (e.g., bert, xlnet, distilbert, xlm)
--model_name_or_path str (required) Pretrained model name or path
--output_dir str (required) Directory for checkpoints and predictions
--data_dir str Directory containing SQuAD JSON files
--train_file str Training file name (e.g., train-v1.1.json)
--predict_file str Evaluation file name (e.g., dev-v1.1.json)
--version_2_with_negative flag Enable SQuAD 2.0 mode with unanswerable questions
--max_seq_length int (default 384) Maximum input sequence length after tokenization
--doc_stride int (default 128) Stride for splitting long documents into chunks
--max_query_length int (default 64) Maximum number of tokens for the question
--per_gpu_train_batch_size int (default 8) Batch size per GPU for training
--learning_rate float (default 5e-5) Initial learning rate for AdamW
--num_train_epochs float (default 3.0) Total training epochs

Outputs

Output Type Description
predictions_{prefix}.json JSON Best predicted answer span for each question
nbest_predictions_{prefix}.json JSON Top-N predicted answer spans with probabilities
null_odds_{prefix}.json JSON Null answer odds (SQuAD 2.0 only)
checkpoint-{step}/ directory Saved model, tokenizer, optimizer, scheduler states
training_args.bin binary Serialized training arguments
Return value Dict[str, float] Evaluation results with F1 and exact match scores

Usage Examples

Fine-tuning BERT on SQuAD 1.1

python run_squad.py \
  --model_type bert \
  --model_name_or_path bert-base-uncased \
  --do_train \
  --do_eval \
  --data_dir /data/squad/ \
  --train_file train-v1.1.json \
  --predict_file dev-v1.1.json \
  --output_dir /output/squad_bert/ \
  --per_gpu_train_batch_size 12 \
  --learning_rate 3e-5 \
  --num_train_epochs 2.0 \
  --max_seq_length 384 \
  --doc_stride 128 \
  --overwrite_output_dir

Fine-tuning XLNet on SQuAD 2.0

python run_squad.py \
  --model_type xlnet \
  --model_name_or_path xlnet-large-cased \
  --do_train \
  --do_eval \
  --version_2_with_negative \
  --data_dir /data/squad/ \
  --train_file train-v2.0.json \
  --predict_file dev-v2.0.json \
  --output_dir /output/squad2_xlnet/ \
  --per_gpu_train_batch_size 4 \
  --learning_rate 3e-5 \
  --num_train_epochs 4.0 \
  --max_seq_length 384 \
  --doc_stride 128 \
  --overwrite_output_dir

Evaluating All Checkpoints

python run_squad.py \
  --model_type bert \
  --model_name_or_path /output/squad_bert/ \
  --do_eval \
  --eval_all_checkpoints \
  --data_dir /data/squad/ \
  --predict_file dev-v1.1.json \
  --output_dir /output/squad_bert/ \
  --max_seq_length 384

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment