Principle:Deepset ai Haystack Extractive Question Answering

Overview

Extractive question answering identifies answer spans within documents rather than generating text, using transformer models trained for question-answering tasks. Given a query and one or more context documents, the model predicts the start and end token positions that delimit the answer span within the original text.

Domains

NLP
Question_Answering

Theory

Extractive QA uses AutoModelForQuestionAnswering from the Hugging Face Transformers library to predict start and end token positions within a given context passage. The model receives a tokenized pair of (query, context) and outputs two vectors of logits: one for the start position and one for the end position of the answer span.

Sliding Window Approach

Documents often exceed the model's maximum sequence length (commonly 384 or 512 tokens). To handle this, a sliding window approach is used:

The document is split into overlapping windows (sequences) that each fit within the model's max sequence length.
The stride parameter controls the number of tokens that overlap between consecutive windows. A stride of 128 means each new window shares 128 tokens with the previous one.
This overlap ensures that answer spans near window boundaries are not missed, since the answer may fall entirely within one of the overlapping segments.

No-Answer Detection

Extractive readers can also detect when no answer is present in the provided documents:

A no-answer score is computed as the product of (1 - score) for all top-k answer candidates. This represents the probability that none of the extracted answers are correct.
If the no-answer score exceeds the scores of all extracted answers, the model effectively signals that no confident answer was found.
This approach differs from SQuAD 2.0-style no-answer logits; instead it derives the no-answer probability from the answer candidates themselves.

Answer Deduplication

When using sliding windows, the same answer span may be extracted from overlapping windows. An overlap threshold mechanism removes duplicate answers:

For each pair of candidate answers from the same document, the character-level overlap is calculated.
If the overlap fraction exceeds the configured threshold, the lower-scoring duplicate is removed.

Mathematical Formulation

The model produces logits for start and end positions across the input sequence.

Score Computation

Given:
  start_logits: vector of length L (one logit per token for start position)
  end_logits:   vector of length L (one logit per token for end position)

Combined logit matrix:
  logits[i][j] = start_logits[i] + end_logits[j]   for all i <= j

Constraint: end position must not precede start position (enforced via upper-triangular mask).

Probability calibration:
  probability(i, j) = sigmoid((start_logits[i] + end_logits[j]) * calibration_factor)

No-answer score:
  no_answer_score = product(1 - score_k) for k in top_k_answers

The use of sigmoid rather than softmax means each answer span is scored independently, making scores comparable across different documents and sequences without per-document normalization.

Key Parameters

max_seq_length: Maximum number of tokens per input sequence (default: 384).
stride: Number of overlapping tokens between consecutive windows (default: 128).
top_k: Number of answer candidates to return per query (default: 20).
calibration_factor: Scaling factor applied to logits before sigmoid (default: 0.1).
score_threshold: Minimum probability score for an answer to be returned.
overlap_threshold: Maximum allowed overlap fraction between answer spans before deduplication (default: 0.01).

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment