Principle:Deepset ai Haystack Answer Construction

Overview

Answer construction transforms raw LLM output into structured answer objects with metadata, source documents, and optional pattern-based extraction. It is a post-processing pattern that bridges the gap between free-form generated text and structured, traceable answer objects suitable for downstream consumption.

Domains

NLP
Post_Processing

Theory

Answer construction is a post-processing pattern that pairs generated text with provenance information (source documents, query context, metadata). It supports regex-based answer extraction and document reference parsing, transforming unstructured generator output into structured, auditable answer objects.

Structured Answer Objects

Raw LLM output is a plain text string (or a ChatMessage). Answer construction wraps this output into a structured object that includes:

Answer data: The extracted or full answer text.
Query: The original query that produced the answer, providing traceability.
Source documents: The documents that were used as context for generation, establishing provenance.
Metadata: Additional information from the generator (model name, usage statistics, finish reason) and any custom metadata.

This structured format enables downstream systems to present answers with citations, verify sources, and audit the generation process.

Regex-Based Answer Extraction

In many scenarios, the LLM's output contains more than just the answer -- it may include reasoning, preamble, or formatting. Answer construction uses regular expression patterns to extract the relevant answer portion from the raw output:

A pattern with no capture group uses the entire regex match as the answer. For example, the pattern [^\n]+$ extracts the last line of a multi-line response.
A pattern with one capture group uses the captured group as the answer. For example, Answer: (.*) extracts everything after "Answer: ".
Patterns with multiple capture groups are rejected, enforcing a single, unambiguous extraction.

If no pattern is specified, the entire generator output is used as the answer.

Document Reference Parsing

When generators produce output that references specific source documents (e.g., "The capital of France is Paris [2]."), answer construction can parse these references using a configurable reference pattern:

References are expressed as 1-based indices into the input document list.
The reference pattern (e.g., \[(\d+)\]) extracts these indices from the answer text.
Referenced documents are tagged with metadata indicating whether they were cited.
The system can optionally return only referenced documents or all documents with reference annotations.

This enables citation-style answer presentation where each claim can be traced to its source document.

Provenance and Auditability

Answer construction supports the principle of answer provenance -- the ability to trace every answer back to its source data and generation context. Each structured answer object carries:

The original query that triggered the answer.
The documents that provided context for generation.
Metadata about the generation process (which model, how many tokens, etc.).
Annotations about which documents were actually cited in the response.

This provenance chain is essential for building trustworthy AI systems where users need to verify and validate generated answers.

Chat and Non-Chat Compatibility

Answer construction works with both:

Non-chat generators: Output is a list of plain strings.
Chat generators: Output is a list of ChatMessage objects, from which text content and metadata are extracted.

This dual compatibility ensures that answer construction can be used as a universal post-processing step regardless of the upstream generator type.

Related Pages

Implementation:Deepset_ai_Haystack_AnswerBuilder

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment