Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Principle:Infiniflow Ragflow Knowledge Base Creation

From Leeroopedia
Knowledge Sources
Domains RAG, Knowledge_Management, Data_Engineering
Last Updated 2026-02-12 06:00 GMT

Overview

A data organization pattern that creates isolated containers for document collections with associated parsing and retrieval configurations.

Description

Knowledge Base Creation is the foundational step in a Retrieval-Augmented Generation system where a named container (dataset) is established to hold documents, their parsed chunks, and associated embeddings. Each knowledge base encapsulates its own parser configuration, language settings, permissions, and search index. This enables multi-tenant isolation and allows different document collections to use different parsing strategies (e.g., academic papers vs legal documents vs general text).

In RAGFlow, a knowledge base maps to a Knowledgebase ORM model in MySQL and a dedicated partition in the document store (Elasticsearch/Infinity). The creation process validates the name, deduplicates within the tenant scope, assigns a UUID, and persists the record.

Usage

Use this principle when initializing a new document collection for RAG. This is always the first step before uploading documents, configuring parsing, or performing retrieval. Each knowledge base should represent a logically coherent set of documents (e.g., "Company Policies", "Technical Documentation", "Legal Contracts").

Theoretical Basis

The knowledge base abstraction follows the namespace isolation pattern common in information retrieval systems:

  • Tenant isolation: Each user's knowledge bases are scoped to their tenant ID, preventing cross-tenant data leakage
  • Parser binding: Each knowledge base declares a default parser type (naive, paper, book, laws, etc.) that determines how uploaded documents will be chunked
  • Index partitioning: Document store indices are partitioned by tenant (ragflow_{tenant_id}), with dataset_id as a sub-partition key

Related Pages

Implemented By

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment