Workflow:Infiniflow Ragflow Chat Application Setup

Knowledge Sources	RAGFlow RAGFlow Docs
Domains	RAG, Conversational_AI, LLMs
Last Updated	2026-02-12 06:00 GMT

Overview

End-to-end process for creating a RAG-powered chat application in RAGFlow, linking it to knowledge bases, configuring the LLM and retrieval settings, and conducting conversations with grounded citations.

Description

This workflow covers the creation and configuration of a chat application (dialog) in RAGFlow. A chat application connects one or more knowledge bases to an LLM, enabling users to ask questions and receive answers grounded in the knowledge base content with traceable citations. The process involves creating the chat application, selecting knowledge bases for retrieval, configuring the LLM model and generation parameters (temperature, top-p, max tokens), setting up the retrieval strategy (similarity threshold, Top-N, keyword weight, reranking), customizing the system prompt, and then creating conversations to interact with the application. RAGFlow supports streaming responses via Server-Sent Events (SSE) and provides inline reference citations for answer verification.

Usage

Execute this workflow after you have created and populated at least one knowledge base with processed documents. Use it when you need to create a conversational interface that answers questions based on your document corpus. This is the primary way end-users interact with RAGFlow's RAG capabilities for question answering and information retrieval.

Execution Steps

Step 1: Create Chat Application

Create a new chat application (dialog) by providing a name and optional description. This creates a Dialog record in the database associated with your tenant. The dialog serves as the configuration container for all RAG and LLM settings.

Key considerations:

Each chat application is isolated with its own configuration
Multiple chat applications can share the same knowledge bases
The application can be renamed or deleted later

Step 2: Select Knowledge Bases

Link one or more knowledge bases to the chat application. These knowledge bases will be searched during retrieval when users ask questions. Multiple knowledge bases can be combined to create a broader knowledge scope. The retrieval system will search across all linked knowledge bases and merge results.

Key considerations:

At least one knowledge base with processed documents should be linked
Knowledge bases using different embedding models can be combined (cross-model retrieval)
The order of knowledge bases does not affect retrieval priority

Step 3: Configure LLM Settings

Select the LLM model for generating responses and configure its parameters. Settings include the model provider and model name, temperature (controls randomness), top-p (nucleus sampling), presence and frequency penalties, and maximum token limit for generated responses. The system prompt can be customized to control the assistant's behavior and persona.

Key considerations:

The LLM must be configured with a valid API key in user settings or system configuration
Temperature closer to 0 produces more deterministic answers, higher values increase creativity
The system prompt template uses a variable for retrieved context that gets populated during retrieval
Message history window size controls how much conversation context is included in each request

Step 4: Configure Retrieval Settings

Set up the retrieval strategy that determines how relevant chunks are found and ranked. Key parameters include similarity threshold (minimum score for inclusion), Top-N (number of chunks to retrieve), keyword weight vs. semantic weight balance, and optional reranking model. Enable knowledge graph usage if the linked knowledge base uses graph-based chunking.

Key considerations:

Similarity threshold filters out low-relevance chunks (higher threshold = more precision, less recall)
Top-N controls how many chunks are included in the LLM context
Keyword weight allows blending BM25 keyword search with semantic similarity
A reranking model can significantly improve result ordering quality
Empty response configuration determines behavior when no relevant chunks are found

Step 5: Create Conversation and Chat

Start a new conversation within the chat application and send messages. Each conversation maintains its own message history. User messages are processed through the retrieval pipeline (query the knowledge bases, rank results, build context), then the LLM generates a response using the retrieved context and conversation history. Responses are streamed via SSE for real-time display.

What happens:

User message is sent to the backend via the conversation API
The retrieval system searches linked knowledge bases using hybrid search (semantic + keyword)
Retrieved chunks are ranked and optionally reranked
The system prompt is populated with retrieved context
The LLM generates a response with inline reference citations
Response is streamed back to the client via Server-Sent Events

Step 6: Review Citations and Iterate

Each response includes inline citations referencing specific chunks from the knowledge base. Users can click on citations to view the original source content and verify answer accuracy. If answers are unsatisfactory, users can provide feedback (thumbs up/down), adjust retrieval or LLM settings, or refine the knowledge base content. The conversation can continue with follow-up questions that benefit from the accumulated context.

Key considerations:

Citations link back to specific document chunks with page references
The document viewer allows previewing the original source document
Feedback is stored and can be used to improve system configuration
Conversation history is maintained across messages within the same conversation

Execution Diagram

GitHub URL

Workflow Repository