Implementation:Deepset ai Haystack ExtractiveReader

Overview

ExtractiveReader is a Haystack pipeline component that performs extractive question answering by locating and extracting answer spans from Documents. It assigns a score to every possible answer span independently of other answer spans, making comparisons across documents easier than implementations that normalize per-document.

Source Location

File: haystack/components/readers/extractive.py, lines 25-540+
Class: ExtractiveReader
Decorator: @component

Import

from haystack.components.readers import ExtractiveReader

Dependencies

transformers (Hugging Face Transformers library)
torch (PyTorch)
accelerate (Hugging Face Accelerate for device management)
tokenizers (Hugging Face Tokenizers for encoding handling)
sentencepiece (tokenizer backend for some models)

Constructor

ExtractiveReader(
    model: str = "deepset/roberta-base-squad2-distilled",
    device: ComponentDevice | None = None,
    token: Secret | None = Secret.from_env_var(["HF_API_TOKEN", "HF_TOKEN"], strict=False),
    top_k: int = 20,
    score_threshold: float | None = None,
    max_seq_length: int = 384,
    stride: int = 128,
    max_batch_size: int | None = None,
    answers_per_seq: int | None = None,
    no_answer: bool = True,
    calibration_factor: float = 0.1,
    overlap_threshold: float | None = 0.01,
    model_kwargs: dict[str, Any] | None = None,
)

Constructor Parameters

Parameter	Type	Default	Description
model	`str`	`"deepset/roberta-base-squad2-distilled"`	Hugging Face QA model identifier or local path
device	None	`None`	Device on which the model is loaded; auto-selected if None
token	None	`Secret.from_env_var(...)`	API token for downloading private models from Hugging Face
top_k	`int`	`20`	Number of answers to return per query
score_threshold	None	`None`	Minimum probability score for returned answers
max_seq_length	`int`	`384`	Maximum number of tokens per sequence; longer documents are split
stride	`int`	`128`	Number of overlapping tokens when splitting sequences
max_batch_size	None	`None`	Maximum number of samples fed through the model at once
answers_per_seq	None	`None`	Number of answer candidates per sequence (relevant for split documents)
no_answer	`bool`	`True`	Whether to include a "no answer" entry with its confidence score
calibration_factor	`float`	`0.1`	Factor for calibrating probability scores via sigmoid
overlap_threshold	None	`0.01`	Maximum overlap fraction for answer deduplication; None keeps all
model_kwargs	None	`None`	Additional kwargs passed to AutoModelForQuestionAnswering.from_pretrained

Run Method

@component.output_types(answers=list[ExtractedAnswer])
def run(
    self,
    query: str,
    documents: list[Document],
    top_k: int | None = None,
    score_threshold: float | None = None,
    max_seq_length: int | None = None,
    stride: int | None = None,
    max_batch_size: int | None = None,
    answers_per_seq: int | None = None,
    no_answer: bool | None = None,
    overlap_threshold: float | None = None,
) -> dict[str, list[ExtractedAnswer]]:

Run Parameters

query (str): The question to answer.
documents (list[Document]): List of Documents to search for answers.
top_k (int | None): Override instance-level top_k for this call.
score_threshold (float | None): Override instance-level score_threshold.
max_seq_length (int | None): Override instance-level max_seq_length.
stride (int | None): Override instance-level stride.
no_answer (bool | None): Override instance-level no_answer setting.
overlap_threshold (float | None): Override instance-level overlap_threshold.

Output

Returns a dictionary with a single key:

"answers": list[ExtractedAnswer] -- Ranked list of extracted answers with scores, document references, and character offsets.

Key Methods

warm_up()

Loads the model and tokenizer from Hugging Face. Must be called before run(). Uses AutoModelForQuestionAnswering.from_pretrained() and AutoTokenizer.from_pretrained().

to_dict() / from_dict()

Serialization and deserialization for pipeline YAML/JSON export. Handles HF model kwargs and token serialization.

deduplicate_by_overlap(answers, overlap_threshold)

Removes overlapping answer spans from the same document. Calculates character-level overlap between answer pairs and removes lower-scoring duplicates that exceed the threshold.

Internal Processing Pipeline

_flatten_documents: Flattens query-document pairs into a single batch axis.
_preprocess: Tokenizes queries and documents with sliding window support. Maps tokens back to query and document IDs.
Model inference: Runs forward pass through AutoModelForQuestionAnswering to get start and end logits.
_postprocess: Converts logits to probabilities using sigmoid with calibration factor. Extracts top-k answer candidates per sequence. Maps token positions back to character offsets.
_nest_answers: Reconstructs nested answer structure. Applies deduplication, top-k filtering, score thresholding, and computes no-answer scores.
_add_answer_page_number: Calculates page number for each answer based on form-feed characters in document content.

Usage Example

from haystack import Document
from haystack.components.readers import ExtractiveReader

docs = [
    Document(content="Python is a popular programming language"),
    Document(content="python ist eine beliebte Programmiersprache"),
]

reader = ExtractiveReader()
reader.warm_up()

question = "What is a popular programming language?"
result = reader.run(query=question, documents=docs)

for answer in result["answers"]:
    if answer.data is not None:
        print(f"Answer: {answer.data} (score: {answer.score:.4f})")
    else:
        print(f"No answer (score: {answer.score:.4f})")

Requires Environment

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment