Implementation:Deepset ai Haystack ExtractedAnswer Dataclass
Overview
ExtractedAnswer is a Python dataclass that represents a span-based answer extracted from a document. It captures the answer text, its position within the source document, surrounding context, a confidence score, and a reference to the source Document. This is a pattern document describing a data structure interface.
Source Location
- File:
haystack/dataclasses/answer.py, lines 28-95 - Dataclass:
ExtractedAnswer - Decorator:
@dataclass
Import
from haystack.dataclasses import ExtractedAnswer
Fields
| Field | Type | Default | Description |
|---|---|---|---|
| query | str |
(required) | The original question that produced this answer |
| score | float |
(required) | Confidence score for this answer |
| data | None | None |
The extracted answer text; None indicates a "no answer" prediction |
| document | None | None |
Reference to the source Document from which the answer was extracted |
| context | None | None |
Surrounding text providing context for the extracted answer |
| document_offset | None | None |
Character positions (start, end) of the answer within the source document |
| context_offset | None | None |
Character positions (start, end) of the answer within the context string |
| meta | dict[str, Any] |
{} |
Additional metadata (e.g., computed page numbers) |
Nested Span Dataclass
@dataclass
class ExtractedAnswer:
# ... fields ...
@dataclass
class Span:
start: int
end: int
The Span nested dataclass represents a character range with:
- start (
int): Inclusive start character position. - end (
int): Exclusive end character position.
Used for both document_offset and context_offset.
Full Dataclass Definition
@dataclass
class ExtractedAnswer:
query: str
score: float
data: str | None = None
document: Document | None = None
context: str | None = None
document_offset: Optional["Span"] = None
context_offset: Optional["Span"] = None
meta: dict[str, Any] = field(default_factory=dict)
@dataclass
class Span:
start: int
end: int
Methods
to_dict()
def to_dict(self) -> dict[str, Any]:
Serializes the ExtractedAnswer to a dictionary:
- Converts the
documentfield usingDocument.to_dict(flatten=False). - Converts
document_offsetandcontext_offsetSpan objects usingdataclasses.asdict(). - Wraps all fields using
default_to_dict()for Haystack-compatible serialization.
from_dict(data)
@classmethod
def from_dict(cls, data: dict[str, Any]) -> "ExtractedAnswer":
Deserializes an ExtractedAnswer from a dictionary:
- Reconstructs the
documentfield usingDocument.from_dict(). - Reconstructs
document_offsetandcontext_offsetasExtractedAnswer.Spanobjects. - Uses
default_from_dict()for Haystack-compatible deserialization.
Protocol Conformance
ExtractedAnswer conforms to the Answer protocol (defined in the same module), which requires:
@runtime_checkable
@dataclass
class Answer(Protocol):
data: Any
query: str
meta: dict[str, Any]
def to_dict(self) -> dict[str, Any]: ...
@classmethod
def from_dict(cls, data: dict[str, Any]) -> "Answer": ...
Usage Example
from haystack.dataclasses import ExtractedAnswer
# A regular extracted answer
answer = ExtractedAnswer(
query="What is Python?",
score=0.95,
data="a popular programming language",
document=some_document,
context="Python is a popular programming language used worldwide",
document_offset=ExtractedAnswer.Span(start=12, end=42),
context_offset=ExtractedAnswer.Span(start=12, end=42),
)
# A "no answer" entry
no_answer = ExtractedAnswer(
query="What is Python?",
score=0.05,
data=None,
document=None,
)
# Serialization round-trip
answer_dict = answer.to_dict()
restored = ExtractedAnswer.from_dict(answer_dict)