Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Microsoft LoRA Run QA

From Leeroopedia


Template:Implementation metadata

Overview

run_qa.py is a modern extractive question answering fine-tuning script that uses AutoModelForQuestionAnswering, a custom QuestionAnsweringTrainer, and the postprocess_qa_predictions utility for answer extraction.

Description

This script fine-tunes transformer models on extractive QA datasets (e.g., SQuAD, SQuAD v2) where the answer is a contiguous span within the context passage. It uses the modern HuggingFace datasets library and a specialized QuestionAnsweringTrainer (imported from trainer_qa.py) that integrates post-processing into the evaluation loop.

Key implementation details:

  • Fast tokenizer requirement: The script requires a PreTrainedTokenizerFast because it relies on return_offsets_mapping and overflow_to_sample_mapping for handling long contexts with sliding window (doc stride).
  • Training preprocessing: Tokenizes question-context pairs with return_overflowing_tokens=True and stride=doc_stride. Maps answer character positions to token positions using offset mappings. Impossible answers (no answer in the current window) are labeled with the CLS token index.
  • Validation preprocessing: Similar tokenization but retains example_id and offset_mapping for post-processing. Sets non-context offset mappings to None.
  • Post-processing: Delegates to postprocess_qa_predictions() from utils_qa to convert start/end logits into answer text substrings.
  • SQuAD v2 support: Configurable version_2_with_negative flag enables handling of unanswerable questions with null_score_diff_threshold.
  • Metrics: Uses the squad or squad_v2 metric from the datasets library.

The script enforces check_min_version("4.4.0") for transformers compatibility.

Usage

Use this script when you need to:

  • Fine-tune any AutoModel-compatible model on SQuAD or SQuAD v2
  • Train on custom extractive QA datasets in CSV/JSON format
  • Handle long documents with sliding window context splitting

Code Reference

Source Location

Property Value
File examples/NLU/examples/question-answering/run_qa.py
Lines 553
Module run_qa
Entry Point main()
Dependencies trainer_qa.QuestionAnsweringTrainer, utils_qa.postprocess_qa_predictions

Signature/CLI

python run_qa.py \
    --model_name_or_path MODEL_NAME \
    --dataset_name DATASET_NAME \
    --output_dir OUTPUT_DIR \
    --do_train \
    --do_eval \
    [--dataset_config_name CONFIG] \
    [--train_file TRAIN_FILE] \
    [--validation_file VALIDATION_FILE] \
    [--max_seq_length 384] \
    [--doc_stride 128] \
    [--pad_to_max_length] \
    [--n_best_size 20] \
    [--max_answer_length 30] \
    [--version_2_with_negative] \
    [--null_score_diff_threshold 0.0]

Import

from transformers import (
    AutoConfig,
    AutoModelForQuestionAnswering,
    AutoTokenizer,
    DataCollatorWithPadding,
    EvalPrediction,
    HfArgumentParser,
    PreTrainedTokenizerFast,
    TrainingArguments,
    default_data_collator,
    set_seed,
)
from trainer_qa import QuestionAnsweringTrainer
from utils_qa import postprocess_qa_predictions
from datasets import load_dataset, load_metric

I/O Contract

Inputs

Parameter Type Required Default Description
--model_name_or_path str Yes - Pretrained model name or path
--output_dir str Yes - Directory for checkpoints and results
--dataset_name str No None HuggingFace dataset name (e.g., squad)
--train_file str No None Custom CSV/JSON training file
--validation_file str No None Custom CSV/JSON validation file
--max_seq_length int No 384 Max tokenized sequence length
--doc_stride int No 128 Stride between document chunks for long contexts
--n_best_size int No 20 Number of n-best predictions to generate
--max_answer_length int No 30 Maximum answer span length in tokens
--version_2_with_negative flag No False Enable SQuAD v2 unanswerable question support
--null_score_diff_threshold float No 0.0 Threshold for null answer selection

Outputs

Output Location Description
Trained model {output_dir}/ Saved model, config, and tokenizer
Predictions {output_dir}/predictions.json Mapping of example IDs to predicted answer strings
N-best predictions {output_dir}/nbest_predictions.json Top-N predictions with scores and probabilities
Null odds {output_dir}/null_odds.json Score diffs for SQuAD v2 (only with version_2_with_negative)
Evaluation metrics {output_dir}/eval_results.json Exact match and F1 scores

Usage Examples

Fine-tune on SQuAD

python examples/NLU/examples/question-answering/run_qa.py \
    --model_name_or_path bert-base-uncased \
    --dataset_name squad \
    --do_train \
    --do_eval \
    --per_device_train_batch_size 12 \
    --learning_rate 3e-5 \
    --num_train_epochs 2 \
    --max_seq_length 384 \
    --doc_stride 128 \
    --output_dir /tmp/squad_output

Fine-tune on SQuAD v2 (with unanswerable questions)

python examples/NLU/examples/question-answering/run_qa.py \
    --model_name_or_path bert-base-uncased \
    --dataset_name squad_v2 \
    --do_train \
    --do_eval \
    --version_2_with_negative \
    --per_device_train_batch_size 12 \
    --learning_rate 3e-5 \
    --num_train_epochs 2 \
    --output_dir /tmp/squad_v2_output

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment