Implementation:Deepset ai Haystack ExtractiveReader
Overview
ExtractiveReader is a Haystack pipeline component that performs extractive question answering by locating and extracting answer spans from Documents. It assigns a score to every possible answer span independently of other answer spans, making comparisons across documents easier than implementations that normalize per-document.
Source Location
- File:
haystack/components/readers/extractive.py, lines 25-540+ - Class:
ExtractiveReader - Decorator:
@component
Import
from haystack.components.readers import ExtractiveReader
Dependencies
- transformers (Hugging Face Transformers library)
- torch (PyTorch)
- accelerate (Hugging Face Accelerate for device management)
- tokenizers (Hugging Face Tokenizers for encoding handling)
- sentencepiece (tokenizer backend for some models)
Constructor
ExtractiveReader(
model: str = "deepset/roberta-base-squad2-distilled",
device: ComponentDevice | None = None,
token: Secret | None = Secret.from_env_var(["HF_API_TOKEN", "HF_TOKEN"], strict=False),
top_k: int = 20,
score_threshold: float | None = None,
max_seq_length: int = 384,
stride: int = 128,
max_batch_size: int | None = None,
answers_per_seq: int | None = None,
no_answer: bool = True,
calibration_factor: float = 0.1,
overlap_threshold: float | None = 0.01,
model_kwargs: dict[str, Any] | None = None,
)
Constructor Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
| model | str |
"deepset/roberta-base-squad2-distilled" |
Hugging Face QA model identifier or local path |
| device | None | None |
Device on which the model is loaded; auto-selected if None |
| token | None | Secret.from_env_var(...) |
API token for downloading private models from Hugging Face |
| top_k | int |
20 |
Number of answers to return per query |
| score_threshold | None | None |
Minimum probability score for returned answers |
| max_seq_length | int |
384 |
Maximum number of tokens per sequence; longer documents are split |
| stride | int |
128 |
Number of overlapping tokens when splitting sequences |
| max_batch_size | None | None |
Maximum number of samples fed through the model at once |
| answers_per_seq | None | None |
Number of answer candidates per sequence (relevant for split documents) |
| no_answer | bool |
True |
Whether to include a "no answer" entry with its confidence score |
| calibration_factor | float |
0.1 |
Factor for calibrating probability scores via sigmoid |
| overlap_threshold | None | 0.01 |
Maximum overlap fraction for answer deduplication; None keeps all |
| model_kwargs | None | None |
Additional kwargs passed to AutoModelForQuestionAnswering.from_pretrained |
Run Method
@component.output_types(answers=list[ExtractedAnswer])
def run(
self,
query: str,
documents: list[Document],
top_k: int | None = None,
score_threshold: float | None = None,
max_seq_length: int | None = None,
stride: int | None = None,
max_batch_size: int | None = None,
answers_per_seq: int | None = None,
no_answer: bool | None = None,
overlap_threshold: float | None = None,
) -> dict[str, list[ExtractedAnswer]]:
Run Parameters
- query (
str): The question to answer. - documents (
list[Document]): List of Documents to search for answers. - top_k (
int | None): Override instance-level top_k for this call. - score_threshold (
float | None): Override instance-level score_threshold. - max_seq_length (
int | None): Override instance-level max_seq_length. - stride (
int | None): Override instance-level stride. - no_answer (
bool | None): Override instance-level no_answer setting. - overlap_threshold (
float | None): Override instance-level overlap_threshold.
Output
Returns a dictionary with a single key:
"answers":list[ExtractedAnswer]-- Ranked list of extracted answers with scores, document references, and character offsets.
Key Methods
warm_up()
Loads the model and tokenizer from Hugging Face. Must be called before run(). Uses AutoModelForQuestionAnswering.from_pretrained() and AutoTokenizer.from_pretrained().
to_dict() / from_dict()
Serialization and deserialization for pipeline YAML/JSON export. Handles HF model kwargs and token serialization.
deduplicate_by_overlap(answers, overlap_threshold)
Removes overlapping answer spans from the same document. Calculates character-level overlap between answer pairs and removes lower-scoring duplicates that exceed the threshold.
Internal Processing Pipeline
- _flatten_documents: Flattens query-document pairs into a single batch axis.
- _preprocess: Tokenizes queries and documents with sliding window support. Maps tokens back to query and document IDs.
- Model inference: Runs forward pass through AutoModelForQuestionAnswering to get start and end logits.
- _postprocess: Converts logits to probabilities using sigmoid with calibration factor. Extracts top-k answer candidates per sequence. Maps token positions back to character offsets.
- _nest_answers: Reconstructs nested answer structure. Applies deduplication, top-k filtering, score thresholding, and computes no-answer scores.
- _add_answer_page_number: Calculates page number for each answer based on form-feed characters in document content.
Usage Example
from haystack import Document
from haystack.components.readers import ExtractiveReader
docs = [
Document(content="Python is a popular programming language"),
Document(content="python ist eine beliebte Programmiersprache"),
]
reader = ExtractiveReader()
reader.warm_up()
question = "What is a popular programming language?"
result = reader.run(query=question, documents=docs)
for answer in result["answers"]:
if answer.data is not None:
print(f"Answer: {answer.data} (score: {answer.score:.4f})")
else:
print(f"No answer (score: {answer.score:.4f})")
Related Pages
Principle:Deepset_ai_Haystack_Extractive_Question_Answering