Workflow:Deepset ai Haystack RAG Pipeline

Knowledge Sources	Haystack Haystack Docs Haystack Tutorials
Domains	LLMs, RAG, NLP
Last Updated	2026-02-11 20:00 GMT

Overview

End-to-end process for building a Retrieval-Augmented Generation (RAG) pipeline that retrieves relevant documents and generates natural language answers using a large language model.

Description

This workflow implements the canonical RAG pattern in Haystack. It connects a document retriever (either BM25 keyword-based or embedding-based) to a prompt builder and LLM generator to produce grounded answers. The pipeline first retrieves relevant documents from a document store, injects them into a prompt template alongside the user question, sends the prompt to an LLM for answer generation, and finally structures the output through an answer builder. This approach reduces hallucination by grounding LLM responses in retrieved evidence.

Usage

Execute this workflow when you have a document store populated with domain-specific content and need to answer natural language questions with LLM-generated responses grounded in your data. Typical triggers include building a knowledge-base Q&A system, customer support chatbot, or any application requiring factual answers derived from a corpus of documents.

Execution Steps

Step 1: Initialize Document Store

Create and configure an in-memory (or persistent) document store that will hold the indexed documents. The document store serves as the retrieval backend for the pipeline.

Key considerations:

Choose between InMemoryDocumentStore for prototyping or a production store (Elasticsearch, Weaviate, etc.)
Configure duplicate handling policy if documents may be re-indexed

Step 2: Embed and Index Documents

If using embedding-based retrieval, generate vector embeddings for all documents using a document embedder (e.g., SentenceTransformers) and write them to the document store via a DocumentWriter. For BM25 retrieval, documents can be written directly.

Key considerations:

Select an embedding model appropriate for your domain and language
Use a separate indexing pipeline with DocumentEmbedder and DocumentWriter components
Batch processing for large document collections

Step 3: Configure Retriever

Instantiate the appropriate retriever component connected to the document store. BM25 retrieval uses keyword matching; embedding retrieval uses semantic similarity via vector search.

Key considerations:

InMemoryBM25Retriever for keyword-based retrieval
InMemoryEmbeddingRetriever for semantic retrieval (requires embedded documents)
Set top_k to control the number of retrieved documents

Step 4: Build Prompt Template

Define a Jinja2 prompt template that combines retrieved documents with the user question. The PromptBuilder (or ChatPromptBuilder for chat models) renders documents and the question into a formatted prompt for the LLM.

Key considerations:

Template must iterate over retrieved documents and include their content
Include the question variable in the template
For chat models, use ChatPromptBuilder with system and user message structure

Step 5: Generate Answer with LLM

Send the rendered prompt to a generator component (e.g., OpenAIGenerator or OpenAIChatGenerator) which calls the LLM API and returns the generated text response.

Key considerations:

Configure the model name and generation parameters (temperature, max tokens)
Handle API key management via environment variables
Streaming support available for real-time response delivery

Step 6: Build Structured Answer

Pass the LLM replies, associated metadata, and retrieved documents through an AnswerBuilder component to produce structured GeneratedAnswer objects containing the answer text, source documents, and query.

Key considerations:

AnswerBuilder combines replies, metadata, and source documents
Output provides traceability from answer back to source documents

Step 7: Connect and Run Pipeline

Wire all components together using Pipeline.connect(), establishing the data flow from retriever through prompt builder and generator to answer builder. Execute the pipeline with the user query.

Pseudocode:

pipeline = Pipeline()
Add retriever, prompt_builder, generator, answer_builder
Connect retriever -> prompt_builder.documents
Connect prompt_builder -> generator
Connect generator.replies -> answer_builder.replies
Run with query inputs

Execution Diagram

GitHub URL

Workflow Repository