Implementation:Microsoft LoRA Utils QA

Overview

utils_qa.py provides post-processing functions that convert raw model logits into human-readable answer substrings for extractive question answering tasks.

Description

This module contains two core post-processing functions used by the QA training scripts:

postprocess_qa_predictions() - The standard post-processor for models that output start and end logits (e.g., BERT, RoBERTa):

Builds a mapping from each example to its corresponding features (a single example can produce multiple features due to sliding window chunking).
For each example, iterates over all associated features and collects the top n_best_size start and end logit positions.
Filters out invalid spans: out-of-bounds indices, non-context tokens (offset_mapping is None), negative-length spans, and spans exceeding max_answer_length.
Sorts candidates by combined start+end logit score and keeps the top n_best_size.
Extracts answer text using character offset mappings back to the original context string.
Computes softmax probabilities over candidate scores using the LogSumExp trick (numpy-only, no torch/tf dependency).
For SQuAD v2 (version_2_with_negative=True): tracks minimum null prediction score across features, compares against the best non-null prediction using null_score_diff_threshold.
Optionally saves predictions.json, nbest_predictions.json, and null_odds.json to the output directory.

postprocess_qa_predictions_with_beam_search() - Specialized post-processor for XLNet-style models with beam search:

Expects 5-element predictions: (start_top_log_probs, start_top_index, end_top_log_probs, end_top_index, cls_logits).
Iterates over start_n_top * end_n_top combinations per feature.
Uses CLS logits directly for null score tracking (instead of start[0]+end[0]).
Returns both predictions and scores_diff_json for v2 format.

Both functions are framework-agnostic (use only numpy) and support distributed training via the is_world_process_zero parameter for logging control.

Usage

Use this module when you need to:

Convert raw start/end logits from QA models into answer text strings
Handle multi-feature examples from sliding window document chunking
Support both answerable and unanswerable question formats (SQuAD v1/v2)

Code Reference

Source Location

Property	Value
File	`examples/NLU/examples/question-answering/utils_qa.py`
Lines	427
Module	`utils_qa`

Signature/CLI

def postprocess_qa_predictions(
    examples,
    features,
    predictions: Tuple[np.ndarray, np.ndarray],
    version_2_with_negative: bool = False,
    n_best_size: int = 20,
    max_answer_length: int = 30,
    null_score_diff_threshold: float = 0.0,
    output_dir: Optional[str] = None,
    prefix: Optional[str] = None,
    is_world_process_zero: bool = True,
) -> collections.OrderedDict

def postprocess_qa_predictions_with_beam_search(
    examples,
    features,
    predictions: Tuple[np.ndarray, np.ndarray],
    version_2_with_negative: bool = False,
    n_best_size: int = 20,
    max_answer_length: int = 30,
    start_n_top: int = 5,
    end_n_top: int = 5,
    output_dir: Optional[str] = None,
    prefix: Optional[str] = None,
    is_world_process_zero: bool = True,
) -> Tuple[collections.OrderedDict, Optional[collections.OrderedDict]]

Import

from utils_qa import postprocess_qa_predictions
from utils_qa import postprocess_qa_predictions_with_beam_search

I/O Contract

Inputs (postprocess_qa_predictions)

Parameter	Type	Required	Default	Description
`examples`	Dataset	Yes	-	Original non-preprocessed dataset with `id` and `context` columns
`features`	Dataset	Yes	-	Preprocessed dataset with `example_id` and `offset_mapping`
`predictions`	Tuple[ndarray, ndarray]	Yes	-	`(start_logits, end_logits)` arrays shaped `[num_features, seq_len]`
`version_2_with_negative`	bool	No	False	Enable SQuAD v2 null answer handling
`n_best_size`	int	No	20	Number of top predictions to consider
`max_answer_length`	int	No	30	Maximum allowed answer span length
`null_score_diff_threshold`	float	No	0.0	Threshold for selecting null answer over best answer
`output_dir`	str	No	None	Directory to save prediction JSON files

Inputs (postprocess_qa_predictions_with_beam_search)

Parameter	Type	Required	Default	Description
`predictions`	Tuple (5 elements)	Yes	-	`(start_top_log_probs, start_top_index, end_top_log_probs, end_top_index, cls_logits)`
`start_n_top`	int	No	5	Number of top start positions for beam search
`end_n_top`	int	No	5	Number of top end positions for beam search

Outputs

Output	Type	Description
all_predictions	`OrderedDict`	Mapping from example ID to predicted answer text
predictions.json	JSON file	Saved predictions (if `output_dir` provided)
nbest_predictions.json	JSON file	N-best predictions with scores, logits, and probabilities
null_odds.json	JSON file	Null vs. best answer score diffs (SQuAD v2 only)
scores_diff_json	`OrderedDict`	Null score diffs (beam search variant only)

Usage Examples

Standard post-processing

import numpy as np
from utils_qa import postprocess_qa_predictions

# predictions is (start_logits, end_logits) from model
start_logits = np.random.randn(100, 384)
end_logits = np.random.randn(100, 384)

all_predictions = postprocess_qa_predictions(
    examples=eval_examples,
    features=eval_features,
    predictions=(start_logits, end_logits),
    n_best_size=20,
    max_answer_length=30,
    output_dir="/tmp/qa_output",
)
# Returns OrderedDict: {"example_id_1": "answer text", ...}

Beam search post-processing for XLNet

from utils_qa import postprocess_qa_predictions_with_beam_search

predictions, scores_diff = postprocess_qa_predictions_with_beam_search(
    examples=eval_examples,
    features=eval_features,
    predictions=(start_top_log_probs, start_top_index,
                 end_top_log_probs, end_top_index, cls_logits),
    version_2_with_negative=True,
    start_n_top=5,
    end_n_top=5,
    output_dir="/tmp/xlnet_qa_output",
)

Related Pages

Environment:Microsoft_LoRA_NLU_Conda_Environment
Implementation:Microsoft_LoRA_Run_QA - Standard QA fine-tuning script
Implementation:Microsoft_LoRA_Run_QA_Beam_Search - XLNet QA with beam search

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment