Implementation:Microsoft LoRA Run QA
Template:Implementation metadata
Overview
run_qa.py is a modern extractive question answering fine-tuning script that uses AutoModelForQuestionAnswering, a custom QuestionAnsweringTrainer, and the postprocess_qa_predictions utility for answer extraction.
Description
This script fine-tunes transformer models on extractive QA datasets (e.g., SQuAD, SQuAD v2) where the answer is a contiguous span within the context passage. It uses the modern HuggingFace datasets library and a specialized QuestionAnsweringTrainer (imported from trainer_qa.py) that integrates post-processing into the evaluation loop.
Key implementation details:
- Fast tokenizer requirement: The script requires a
PreTrainedTokenizerFastbecause it relies onreturn_offsets_mappingandoverflow_to_sample_mappingfor handling long contexts with sliding window (doc stride). - Training preprocessing: Tokenizes question-context pairs with
return_overflowing_tokens=Trueandstride=doc_stride. Maps answer character positions to token positions using offset mappings. Impossible answers (no answer in the current window) are labeled with the CLS token index. - Validation preprocessing: Similar tokenization but retains
example_idandoffset_mappingfor post-processing. Sets non-context offset mappings to None. - Post-processing: Delegates to
postprocess_qa_predictions()fromutils_qato convert start/end logits into answer text substrings. - SQuAD v2 support: Configurable
version_2_with_negativeflag enables handling of unanswerable questions withnull_score_diff_threshold. - Metrics: Uses the
squadorsquad_v2metric from the datasets library.
The script enforces check_min_version("4.4.0") for transformers compatibility.
Usage
Use this script when you need to:
- Fine-tune any AutoModel-compatible model on SQuAD or SQuAD v2
- Train on custom extractive QA datasets in CSV/JSON format
- Handle long documents with sliding window context splitting
Code Reference
Source Location
| Property | Value |
|---|---|
| File | examples/NLU/examples/question-answering/run_qa.py
|
| Lines | 553 |
| Module | run_qa
|
| Entry Point | main()
|
| Dependencies | trainer_qa.QuestionAnsweringTrainer, utils_qa.postprocess_qa_predictions
|
Signature/CLI
python run_qa.py \
--model_name_or_path MODEL_NAME \
--dataset_name DATASET_NAME \
--output_dir OUTPUT_DIR \
--do_train \
--do_eval \
[--dataset_config_name CONFIG] \
[--train_file TRAIN_FILE] \
[--validation_file VALIDATION_FILE] \
[--max_seq_length 384] \
[--doc_stride 128] \
[--pad_to_max_length] \
[--n_best_size 20] \
[--max_answer_length 30] \
[--version_2_with_negative] \
[--null_score_diff_threshold 0.0]
Import
from transformers import (
AutoConfig,
AutoModelForQuestionAnswering,
AutoTokenizer,
DataCollatorWithPadding,
EvalPrediction,
HfArgumentParser,
PreTrainedTokenizerFast,
TrainingArguments,
default_data_collator,
set_seed,
)
from trainer_qa import QuestionAnsweringTrainer
from utils_qa import postprocess_qa_predictions
from datasets import load_dataset, load_metric
I/O Contract
Inputs
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
--model_name_or_path |
str | Yes | - | Pretrained model name or path |
--output_dir |
str | Yes | - | Directory for checkpoints and results |
--dataset_name |
str | No | None | HuggingFace dataset name (e.g., squad)
|
--train_file |
str | No | None | Custom CSV/JSON training file |
--validation_file |
str | No | None | Custom CSV/JSON validation file |
--max_seq_length |
int | No | 384 | Max tokenized sequence length |
--doc_stride |
int | No | 128 | Stride between document chunks for long contexts |
--n_best_size |
int | No | 20 | Number of n-best predictions to generate |
--max_answer_length |
int | No | 30 | Maximum answer span length in tokens |
--version_2_with_negative |
flag | No | False | Enable SQuAD v2 unanswerable question support |
--null_score_diff_threshold |
float | No | 0.0 | Threshold for null answer selection |
Outputs
| Output | Location | Description |
|---|---|---|
| Trained model | {output_dir}/ |
Saved model, config, and tokenizer |
| Predictions | {output_dir}/predictions.json |
Mapping of example IDs to predicted answer strings |
| N-best predictions | {output_dir}/nbest_predictions.json |
Top-N predictions with scores and probabilities |
| Null odds | {output_dir}/null_odds.json |
Score diffs for SQuAD v2 (only with version_2_with_negative)
|
| Evaluation metrics | {output_dir}/eval_results.json |
Exact match and F1 scores |
Usage Examples
Fine-tune on SQuAD
python examples/NLU/examples/question-answering/run_qa.py \
--model_name_or_path bert-base-uncased \
--dataset_name squad \
--do_train \
--do_eval \
--per_device_train_batch_size 12 \
--learning_rate 3e-5 \
--num_train_epochs 2 \
--max_seq_length 384 \
--doc_stride 128 \
--output_dir /tmp/squad_output
Fine-tune on SQuAD v2 (with unanswerable questions)
python examples/NLU/examples/question-answering/run_qa.py \
--model_name_or_path bert-base-uncased \
--dataset_name squad_v2 \
--do_train \
--do_eval \
--version_2_with_negative \
--per_device_train_batch_size 12 \
--learning_rate 3e-5 \
--num_train_epochs 2 \
--output_dir /tmp/squad_v2_output
Related Pages
- Environment:Microsoft_LoRA_NLU_Conda_Environment
- Implementation:Microsoft_LoRA_Utils_QA - Post-processing utilities for QA predictions
- Implementation:Microsoft_LoRA_Run_QA_Beam_Search - XLNet QA variant with beam search decoding