Principle:Infiniflow Ragflow Knowledge Base Creation
| Knowledge Sources | |
|---|---|
| Domains | RAG, Knowledge_Management, Data_Engineering |
| Last Updated | 2026-02-12 06:00 GMT |
Overview
A data organization pattern that creates isolated containers for document collections with associated parsing and retrieval configurations.
Description
Knowledge Base Creation is the foundational step in a Retrieval-Augmented Generation system where a named container (dataset) is established to hold documents, their parsed chunks, and associated embeddings. Each knowledge base encapsulates its own parser configuration, language settings, permissions, and search index. This enables multi-tenant isolation and allows different document collections to use different parsing strategies (e.g., academic papers vs legal documents vs general text).
In RAGFlow, a knowledge base maps to a Knowledgebase ORM model in MySQL and a dedicated partition in the document store (Elasticsearch/Infinity). The creation process validates the name, deduplicates within the tenant scope, assigns a UUID, and persists the record.
Usage
Use this principle when initializing a new document collection for RAG. This is always the first step before uploading documents, configuring parsing, or performing retrieval. Each knowledge base should represent a logically coherent set of documents (e.g., "Company Policies", "Technical Documentation", "Legal Contracts").
Theoretical Basis
The knowledge base abstraction follows the namespace isolation pattern common in information retrieval systems:
- Tenant isolation: Each user's knowledge bases are scoped to their tenant ID, preventing cross-tenant data leakage
- Parser binding: Each knowledge base declares a default parser type (naive, paper, book, laws, etc.) that determines how uploaded documents will be chunked
- Index partitioning: Document store indices are partitioned by tenant (ragflow_{tenant_id}), with dataset_id as a sub-partition key