Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:Deepset ai Haystack AnswerBuilder

From Leeroopedia

Overview

AnswerBuilder is a Haystack component that converts a query and Generator replies into structured GeneratedAnswer objects. It supports regex-based answer extraction, document reference parsing with citation tracking, and works with both plain string replies and ChatMessage objects from chat generators.

Source Location

  • File: haystack/components/builders/answer_builder.py (Lines 16-257)
  • Class: AnswerBuilder
  • Component decorator: @component

Import

from haystack.components.builders import AnswerBuilder

Dependencies

  • haystack.dataclasses: Provides GeneratedAnswer, Document, and ChatMessage.
  • re (standard library): Used for regex-based answer extraction and reference parsing.

Constructor

def __init__(
    self,
    pattern: str | None = None,
    reference_pattern: str | None = None,
    last_message_only: bool = False,
    *,
    return_only_referenced_documents: bool = True,
)

Parameters

  • pattern (str | None): Regular expression pattern to extract the answer text from the generator output. If not specified, the entire response is used as the answer. The pattern may contain at most one capture group:
    • No capture group: The whole regex match is used as the answer. Example: [^\n]+$ extracts the last line.
    • One capture group: The captured group text is used as the answer. Example: Answer: (.*) extracts everything after "Answer: ".
    • Multiple capture groups: Rejected with a ValueError.
  • reference_pattern (str | None): Regular expression pattern for parsing document references from the generated text. References must be 1-based indices. Example: \[(\d+)\] extracts "1" from "answer[1]". When provided, documents receive a "referenced" metadata field.
  • last_message_only (bool): If True, only the last reply is processed. If False (default), all replies are processed.
  • return_only_referenced_documents (bool): When used with reference_pattern, if True (default), only documents actually referenced in the reply are included. If False, all documents are included with reference annotations. Has no effect when reference_pattern is not provided.

Run Method

@component.output_types(answers=list[GeneratedAnswer])
def run(
    self,
    query: str,
    replies: list[str] | list[ChatMessage],
    meta: list[dict[str, Any]] | None = None,
    documents: list[Document] | None = None,
    pattern: str | None = None,
    reference_pattern: str | None = None,
) -> dict:  # Returns {"answers": list[GeneratedAnswer]}

Parameters

  • query (str): The input query that was used as the generator prompt.
  • replies (list[str] | list[ChatMessage]): The generator output. Can be plain strings (from non-chat generators) or ChatMessage objects (from chat generators).
  • meta (list[dict] | None): Optional metadata from the generator, one dictionary per reply. Must match the length of replies if provided.
  • documents (list[Document] | None): Optional source documents used as generator context. When provided, they are attached to the GeneratedAnswer objects with provenance annotations.
  • pattern (str | None): Optional runtime override for the answer extraction pattern.
  • reference_pattern (str | None): Optional runtime override for the reference parsing pattern.

Returns

  • {"answers": list[GeneratedAnswer]}: A dictionary containing structured answer objects, one per processed reply.

Behavior

  1. Initializes default empty metadata if none is provided; validates that replies and meta lengths match.
  2. Validates any runtime pattern for capture group count.
  3. Selects the pattern and reference_pattern (runtime > init).
  4. If last_message_only is True, restricts processing to the last reply and its metadata.
  5. For each reply:
    • Extracts text content: uses .text for ChatMessage objects, str() for strings.
    • Extracts metadata: merges ChatMessage.meta (if applicable) with the provided meta dictionary, and adds all_messages containing the full replies list.
    • Document reference processing (if documents are provided):
      • If reference_pattern is set, extracts 1-based document indices from the reply text.
      • Each document receives a "source_index" metadata field (1-based position in the input list).
      • Each document receives a "referenced" boolean metadata field when reference parsing is active.
      • Out-of-range indices are logged as warnings and skipped.
      • If return_only_referenced_documents is True, only referenced documents are included.
    • Answer text extraction: Applies the regex pattern to extract the answer string. If the pattern does not match, an empty string is returned.
    • Constructs a GeneratedAnswer with the extracted text, query, processed documents, and merged metadata.

Static Helper Methods

_extract_answer_string

@staticmethod
def _extract_answer_string(reply: str, pattern: str | None = None) -> str

Extracts the answer from the reply using the regex pattern. Returns the full reply if no pattern is specified, the capture group if present, or an empty string if no match is found.

_extract_reference_idxs

@staticmethod
def _extract_reference_idxs(reply: str, reference_pattern: str) -> set[int]

Extracts all document reference indices from the reply text. Converts 1-based references to 0-based indices for internal processing.

_check_num_groups_in_regex

@staticmethod
def _check_num_groups_in_regex(pattern: str)

Validates that a regex pattern contains at most one capture group. Raises ValueError for patterns with multiple groups.

Usage Examples

Basic Answer Extraction

from haystack.components.builders import AnswerBuilder

builder = AnswerBuilder(pattern="Answer: (.*)")
result = builder.run(
    query="What's the answer?",
    replies=["This is an argument. Answer: This is the answer."],
)
# result["answers"][0].data == "This is the answer."

With Documents and Reference Parsing

from haystack import Document
from haystack.components.builders import AnswerBuilder

replies = ["The capital of France is Paris [2]."]
docs = [
    Document(content="Berlin is the capital of Germany."),
    Document(content="Paris is the capital of France."),
    Document(content="Rome is the capital of Italy."),
]

builder = AnswerBuilder(reference_pattern="\\[(\\d+)\\]", return_only_referenced_documents=False)
result = builder.run(query="What is the capital of France?", replies=replies, documents=docs)

answer = result["answers"][0]
print(f"Answer: {answer.data}")
# Answer: The capital of France is Paris [2].

for doc in answer.documents:
    if doc.meta["referenced"]:
        print(f"[{doc.meta['source_index']}] {doc.content}")
# [2] Paris is the capital of France.

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment