Implementation:Microsoft LoRA Utils QA
Template:Implementation metadata
Overview
utils_qa.py provides post-processing functions that convert raw model logits into human-readable answer substrings for extractive question answering tasks.
Description
This module contains two core post-processing functions used by the QA training scripts:
postprocess_qa_predictions() - The standard post-processor for models that output start and end logits (e.g., BERT, RoBERTa):
- Builds a mapping from each example to its corresponding features (a single example can produce multiple features due to sliding window chunking).
- For each example, iterates over all associated features and collects the top
n_best_sizestart and end logit positions. - Filters out invalid spans: out-of-bounds indices, non-context tokens (offset_mapping is None), negative-length spans, and spans exceeding
max_answer_length. - Sorts candidates by combined start+end logit score and keeps the top
n_best_size. - Extracts answer text using character offset mappings back to the original context string.
- Computes softmax probabilities over candidate scores using the LogSumExp trick (numpy-only, no torch/tf dependency).
- For SQuAD v2 (
version_2_with_negative=True): tracks minimum null prediction score across features, compares against the best non-null prediction usingnull_score_diff_threshold. - Optionally saves
predictions.json,nbest_predictions.json, andnull_odds.jsonto the output directory.
postprocess_qa_predictions_with_beam_search() - Specialized post-processor for XLNet-style models with beam search:
- Expects 5-element predictions:
(start_top_log_probs, start_top_index, end_top_log_probs, end_top_index, cls_logits). - Iterates over
start_n_top * end_n_topcombinations per feature. - Uses CLS logits directly for null score tracking (instead of start[0]+end[0]).
- Returns both predictions and scores_diff_json for v2 format.
Both functions are framework-agnostic (use only numpy) and support distributed training via the is_world_process_zero parameter for logging control.
Usage
Use this module when you need to:
- Convert raw start/end logits from QA models into answer text strings
- Handle multi-feature examples from sliding window document chunking
- Support both answerable and unanswerable question formats (SQuAD v1/v2)
Code Reference
Source Location
| Property | Value |
|---|---|
| File | examples/NLU/examples/question-answering/utils_qa.py
|
| Lines | 427 |
| Module | utils_qa
|
Signature/CLI
def postprocess_qa_predictions(
examples,
features,
predictions: Tuple[np.ndarray, np.ndarray],
version_2_with_negative: bool = False,
n_best_size: int = 20,
max_answer_length: int = 30,
null_score_diff_threshold: float = 0.0,
output_dir: Optional[str] = None,
prefix: Optional[str] = None,
is_world_process_zero: bool = True,
) -> collections.OrderedDict
def postprocess_qa_predictions_with_beam_search(
examples,
features,
predictions: Tuple[np.ndarray, np.ndarray],
version_2_with_negative: bool = False,
n_best_size: int = 20,
max_answer_length: int = 30,
start_n_top: int = 5,
end_n_top: int = 5,
output_dir: Optional[str] = None,
prefix: Optional[str] = None,
is_world_process_zero: bool = True,
) -> Tuple[collections.OrderedDict, Optional[collections.OrderedDict]]
Import
from utils_qa import postprocess_qa_predictions
from utils_qa import postprocess_qa_predictions_with_beam_search
I/O Contract
Inputs (postprocess_qa_predictions)
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
examples |
Dataset | Yes | - | Original non-preprocessed dataset with id and context columns
|
features |
Dataset | Yes | - | Preprocessed dataset with example_id and offset_mapping
|
predictions |
Tuple[ndarray, ndarray] | Yes | - | (start_logits, end_logits) arrays shaped [num_features, seq_len]
|
version_2_with_negative |
bool | No | False | Enable SQuAD v2 null answer handling |
n_best_size |
int | No | 20 | Number of top predictions to consider |
max_answer_length |
int | No | 30 | Maximum allowed answer span length |
null_score_diff_threshold |
float | No | 0.0 | Threshold for selecting null answer over best answer |
output_dir |
str | No | None | Directory to save prediction JSON files |
Inputs (postprocess_qa_predictions_with_beam_search)
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
predictions |
Tuple (5 elements) | Yes | - | (start_top_log_probs, start_top_index, end_top_log_probs, end_top_index, cls_logits)
|
start_n_top |
int | No | 5 | Number of top start positions for beam search |
end_n_top |
int | No | 5 | Number of top end positions for beam search |
Outputs
| Output | Type | Description |
|---|---|---|
| all_predictions | OrderedDict |
Mapping from example ID to predicted answer text |
| predictions.json | JSON file | Saved predictions (if output_dir provided)
|
| nbest_predictions.json | JSON file | N-best predictions with scores, logits, and probabilities |
| null_odds.json | JSON file | Null vs. best answer score diffs (SQuAD v2 only) |
| scores_diff_json | OrderedDict |
Null score diffs (beam search variant only) |
Usage Examples
Standard post-processing
import numpy as np
from utils_qa import postprocess_qa_predictions
# predictions is (start_logits, end_logits) from model
start_logits = np.random.randn(100, 384)
end_logits = np.random.randn(100, 384)
all_predictions = postprocess_qa_predictions(
examples=eval_examples,
features=eval_features,
predictions=(start_logits, end_logits),
n_best_size=20,
max_answer_length=30,
output_dir="/tmp/qa_output",
)
# Returns OrderedDict: {"example_id_1": "answer text", ...}
Beam search post-processing for XLNet
from utils_qa import postprocess_qa_predictions_with_beam_search
predictions, scores_diff = postprocess_qa_predictions_with_beam_search(
examples=eval_examples,
features=eval_features,
predictions=(start_top_log_probs, start_top_index,
end_top_log_probs, end_top_index, cls_logits),
version_2_with_negative=True,
start_n_top=5,
end_n_top=5,
output_dir="/tmp/xlnet_qa_output",
)
Related Pages
- Environment:Microsoft_LoRA_NLU_Conda_Environment
- Implementation:Microsoft_LoRA_Run_QA - Standard QA fine-tuning script
- Implementation:Microsoft_LoRA_Run_QA_Beam_Search - XLNet QA with beam search