Implementation:Deepset ai Haystack ExtractedAnswer Dataclass

Overview

ExtractedAnswer is a Python dataclass that represents a span-based answer extracted from a document. It captures the answer text, its position within the source document, surrounding context, a confidence score, and a reference to the source Document. This is a pattern document describing a data structure interface.

Source Location

File: haystack/dataclasses/answer.py, lines 28-95
Dataclass: ExtractedAnswer
Decorator: @dataclass

Import

from haystack.dataclasses import ExtractedAnswer

Fields

Field	Type	Default	Description
query	`str`	(required)	The original question that produced this answer
score	`float`	(required)	Confidence score for this answer
data	None	`None`	The extracted answer text; None indicates a "no answer" prediction
document	None	`None`	Reference to the source Document from which the answer was extracted
context	None	`None`	Surrounding text providing context for the extracted answer
document_offset	None	`None`	Character positions (start, end) of the answer within the source document
context_offset	None	`None`	Character positions (start, end) of the answer within the context string
meta	`dict[str, Any]`	`{}`	Additional metadata (e.g., computed page numbers)

Nested Span Dataclass

@dataclass
class ExtractedAnswer:
    # ... fields ...

    @dataclass
    class Span:
        start: int
        end: int

The Span nested dataclass represents a character range with:

start (int): Inclusive start character position.
end (int): Exclusive end character position.

Used for both document_offset and context_offset.

Full Dataclass Definition

@dataclass
class ExtractedAnswer:
    query: str
    score: float
    data: str | None = None
    document: Document | None = None
    context: str | None = None
    document_offset: Optional["Span"] = None
    context_offset: Optional["Span"] = None
    meta: dict[str, Any] = field(default_factory=dict)

    @dataclass
    class Span:
        start: int
        end: int

Methods

to_dict()

def to_dict(self) -> dict[str, Any]:

Serializes the ExtractedAnswer to a dictionary:

Converts the document field using Document.to_dict(flatten=False).
Converts document_offset and context_offset Span objects using dataclasses.asdict().
Wraps all fields using default_to_dict() for Haystack-compatible serialization.

from_dict(data)

@classmethod
def from_dict(cls, data: dict[str, Any]) -> "ExtractedAnswer":

Deserializes an ExtractedAnswer from a dictionary:

Reconstructs the document field using Document.from_dict().
Reconstructs document_offset and context_offset as ExtractedAnswer.Span objects.
Uses default_from_dict() for Haystack-compatible deserialization.

Protocol Conformance

ExtractedAnswer conforms to the Answer protocol (defined in the same module), which requires:

@runtime_checkable
@dataclass
class Answer(Protocol):
    data: Any
    query: str
    meta: dict[str, Any]

    def to_dict(self) -> dict[str, Any]: ...

    @classmethod
    def from_dict(cls, data: dict[str, Any]) -> "Answer": ...

Usage Example

from haystack.dataclasses import ExtractedAnswer

# A regular extracted answer
answer = ExtractedAnswer(
    query="What is Python?",
    score=0.95,
    data="a popular programming language",
    document=some_document,
    context="Python is a popular programming language used worldwide",
    document_offset=ExtractedAnswer.Span(start=12, end=42),
    context_offset=ExtractedAnswer.Span(start=12, end=42),
)

# A "no answer" entry
no_answer = ExtractedAnswer(
    query="What is Python?",
    score=0.05,
    data=None,
    document=None,
)

# Serialization round-trip
answer_dict = answer.to_dict()
restored = ExtractedAnswer.from_dict(answer_dict)

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment