Implementation:Microsoft LoRA Run QA

Overview

run_qa.py is a modern extractive question answering fine-tuning script that uses AutoModelForQuestionAnswering, a custom QuestionAnsweringTrainer, and the postprocess_qa_predictions utility for answer extraction.

Description

This script fine-tunes transformer models on extractive QA datasets (e.g., SQuAD, SQuAD v2) where the answer is a contiguous span within the context passage. It uses the modern HuggingFace datasets library and a specialized QuestionAnsweringTrainer (imported from trainer_qa.py) that integrates post-processing into the evaluation loop.

Key implementation details:

Fast tokenizer requirement: The script requires a PreTrainedTokenizerFast because it relies on return_offsets_mapping and overflow_to_sample_mapping for handling long contexts with sliding window (doc stride).
Training preprocessing: Tokenizes question-context pairs with return_overflowing_tokens=True and stride=doc_stride. Maps answer character positions to token positions using offset mappings. Impossible answers (no answer in the current window) are labeled with the CLS token index.
Validation preprocessing: Similar tokenization but retains example_id and offset_mapping for post-processing. Sets non-context offset mappings to None.
Post-processing: Delegates to postprocess_qa_predictions() from utils_qa to convert start/end logits into answer text substrings.
SQuAD v2 support: Configurable version_2_with_negative flag enables handling of unanswerable questions with null_score_diff_threshold.
Metrics: Uses the squad or squad_v2 metric from the datasets library.

The script enforces check_min_version("4.4.0") for transformers compatibility.

Usage

Use this script when you need to:

Fine-tune any AutoModel-compatible model on SQuAD or SQuAD v2
Train on custom extractive QA datasets in CSV/JSON format
Handle long documents with sliding window context splitting

Code Reference

Source Location

Property	Value
File	`examples/NLU/examples/question-answering/run_qa.py`
Lines	553
Module	`run_qa`
Entry Point	`main()`
Dependencies	`trainer_qa.QuestionAnsweringTrainer`, `utils_qa.postprocess_qa_predictions`

Signature/CLI

python run_qa.py \
    --model_name_or_path MODEL_NAME \
    --dataset_name DATASET_NAME \
    --output_dir OUTPUT_DIR \
    --do_train \
    --do_eval \
    [--dataset_config_name CONFIG] \
    [--train_file TRAIN_FILE] \
    [--validation_file VALIDATION_FILE] \
    [--max_seq_length 384] \
    [--doc_stride 128] \
    [--pad_to_max_length] \
    [--n_best_size 20] \
    [--max_answer_length 30] \
    [--version_2_with_negative] \
    [--null_score_diff_threshold 0.0]

Import

from transformers import (
    AutoConfig,
    AutoModelForQuestionAnswering,
    AutoTokenizer,
    DataCollatorWithPadding,
    EvalPrediction,
    HfArgumentParser,
    PreTrainedTokenizerFast,
    TrainingArguments,
    default_data_collator,
    set_seed,
)
from trainer_qa import QuestionAnsweringTrainer
from utils_qa import postprocess_qa_predictions
from datasets import load_dataset, load_metric

I/O Contract

Inputs

Parameter	Type	Required	Default	Description
`--model_name_or_path`	str	Yes	-	Pretrained model name or path
`--output_dir`	str	Yes	-	Directory for checkpoints and results
`--dataset_name`	str	No	None	HuggingFace dataset name (e.g., `squad`)
`--train_file`	str	No	None	Custom CSV/JSON training file
`--validation_file`	str	No	None	Custom CSV/JSON validation file
`--max_seq_length`	int	No	384	Max tokenized sequence length
`--doc_stride`	int	No	128	Stride between document chunks for long contexts
`--n_best_size`	int	No	20	Number of n-best predictions to generate
`--max_answer_length`	int	No	30	Maximum answer span length in tokens
`--version_2_with_negative`	flag	No	False	Enable SQuAD v2 unanswerable question support
`--null_score_diff_threshold`	float	No	0.0	Threshold for null answer selection

Outputs

Output	Location	Description
Trained model	`{output_dir}/`	Saved model, config, and tokenizer
Predictions	`{output_dir}/predictions.json`	Mapping of example IDs to predicted answer strings
N-best predictions	`{output_dir}/nbest_predictions.json`	Top-N predictions with scores and probabilities
Null odds	`{output_dir}/null_odds.json`	Score diffs for SQuAD v2 (only with `version_2_with_negative`)
Evaluation metrics	`{output_dir}/eval_results.json`	Exact match and F1 scores

Usage Examples

Fine-tune on SQuAD

python examples/NLU/examples/question-answering/run_qa.py \
    --model_name_or_path bert-base-uncased \
    --dataset_name squad \
    --do_train \
    --do_eval \
    --per_device_train_batch_size 12 \
    --learning_rate 3e-5 \
    --num_train_epochs 2 \
    --max_seq_length 384 \
    --doc_stride 128 \
    --output_dir /tmp/squad_output

Fine-tune on SQuAD v2 (with unanswerable questions)

python examples/NLU/examples/question-answering/run_qa.py \
    --model_name_or_path bert-base-uncased \
    --dataset_name squad_v2 \
    --do_train \
    --do_eval \
    --version_2_with_negative \
    --per_device_train_batch_size 12 \
    --learning_rate 3e-5 \
    --num_train_epochs 2 \
    --output_dir /tmp/squad_v2_output

Related Pages

Environment:Microsoft_LoRA_NLU_Conda_Environment
Implementation:Microsoft_LoRA_Utils_QA - Post-processing utilities for QA predictions
Implementation:Microsoft_LoRA_Run_QA_Beam_Search - XLNet QA variant with beam search decoding

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment