Workflow:Infiniflow Ragflow Chat Application Setup
| Knowledge Sources | |
|---|---|
| Domains | RAG, Conversational_AI, LLMs |
| Last Updated | 2026-02-12 06:00 GMT |
Overview
End-to-end process for creating a RAG-powered chat application in RAGFlow, linking it to knowledge bases, configuring the LLM and retrieval settings, and conducting conversations with grounded citations.
Description
This workflow covers the creation and configuration of a chat application (dialog) in RAGFlow. A chat application connects one or more knowledge bases to an LLM, enabling users to ask questions and receive answers grounded in the knowledge base content with traceable citations. The process involves creating the chat application, selecting knowledge bases for retrieval, configuring the LLM model and generation parameters (temperature, top-p, max tokens), setting up the retrieval strategy (similarity threshold, Top-N, keyword weight, reranking), customizing the system prompt, and then creating conversations to interact with the application. RAGFlow supports streaming responses via Server-Sent Events (SSE) and provides inline reference citations for answer verification.
Usage
Execute this workflow after you have created and populated at least one knowledge base with processed documents. Use it when you need to create a conversational interface that answers questions based on your document corpus. This is the primary way end-users interact with RAGFlow's RAG capabilities for question answering and information retrieval.
Execution Steps
Step 1: Create Chat Application
Create a new chat application (dialog) by providing a name and optional description. This creates a Dialog record in the database associated with your tenant. The dialog serves as the configuration container for all RAG and LLM settings.
Key considerations:
- Each chat application is isolated with its own configuration
- Multiple chat applications can share the same knowledge bases
- The application can be renamed or deleted later
Step 2: Select Knowledge Bases
Link one or more knowledge bases to the chat application. These knowledge bases will be searched during retrieval when users ask questions. Multiple knowledge bases can be combined to create a broader knowledge scope. The retrieval system will search across all linked knowledge bases and merge results.
Key considerations:
- At least one knowledge base with processed documents should be linked
- Knowledge bases using different embedding models can be combined (cross-model retrieval)
- The order of knowledge bases does not affect retrieval priority
Step 3: Configure LLM Settings
Select the LLM model for generating responses and configure its parameters. Settings include the model provider and model name, temperature (controls randomness), top-p (nucleus sampling), presence and frequency penalties, and maximum token limit for generated responses. The system prompt can be customized to control the assistant's behavior and persona.
Key considerations:
- The LLM must be configured with a valid API key in user settings or system configuration
- Temperature closer to 0 produces more deterministic answers, higher values increase creativity
- The system prompt template uses a variable for retrieved context that gets populated during retrieval
- Message history window size controls how much conversation context is included in each request
Step 4: Configure Retrieval Settings
Set up the retrieval strategy that determines how relevant chunks are found and ranked. Key parameters include similarity threshold (minimum score for inclusion), Top-N (number of chunks to retrieve), keyword weight vs. semantic weight balance, and optional reranking model. Enable knowledge graph usage if the linked knowledge base uses graph-based chunking.
Key considerations:
- Similarity threshold filters out low-relevance chunks (higher threshold = more precision, less recall)
- Top-N controls how many chunks are included in the LLM context
- Keyword weight allows blending BM25 keyword search with semantic similarity
- A reranking model can significantly improve result ordering quality
- Empty response configuration determines behavior when no relevant chunks are found
Step 5: Create Conversation and Chat
Start a new conversation within the chat application and send messages. Each conversation maintains its own message history. User messages are processed through the retrieval pipeline (query the knowledge bases, rank results, build context), then the LLM generates a response using the retrieved context and conversation history. Responses are streamed via SSE for real-time display.
What happens:
- User message is sent to the backend via the conversation API
- The retrieval system searches linked knowledge bases using hybrid search (semantic + keyword)
- Retrieved chunks are ranked and optionally reranked
- The system prompt is populated with retrieved context
- The LLM generates a response with inline reference citations
- Response is streamed back to the client via Server-Sent Events
Step 6: Review Citations and Iterate
Each response includes inline citations referencing specific chunks from the knowledge base. Users can click on citations to view the original source content and verify answer accuracy. If answers are unsatisfactory, users can provide feedback (thumbs up/down), adjust retrieval or LLM settings, or refine the knowledge base content. The conversation can continue with follow-up questions that benefit from the accumulated context.
Key considerations:
- Citations link back to specific document chunks with page references
- The document viewer allows previewing the original source document
- Feedback is stored and can be used to improve system configuration
- Conversation history is maintained across messages within the same conversation