Principle:Infiniflow Ragflow Document Store Indexing
| Knowledge Sources | |
|---|---|
| Domains | RAG, Information_Retrieval, Data_Engineering |
| Last Updated | 2026-02-12 06:00 GMT |
Overview
A data persistence pattern that bulk-inserts processed chunks with embeddings into a pluggable document store for search indexing.
Description
Document Store Indexing is the final step in the processing pipeline where fully processed chunks (with text, embeddings, and tokenized keywords) are bulk-inserted into the document store. RAGFlow supports multiple backends: Elasticsearch, Infinity, OpenSearch, and OceanBase. The index is partitioned by tenant (ragflow_{tenant_id}) with dataset_id as a sub-partition. The abstract DocStoreConnection interface ensures backend-agnostic operation.
Usage
Operates automatically at the end of document processing. The backend is selected via the DOC_ENGINE environment variable.
Theoretical Basis
Efficient indexing requires:
- Bulk operations: Inserting chunks in batches (configurable via DOC_BULK_SIZE, default 4) reduces network overhead
- Index auto-creation: If the index doesn't exist, it's created with the appropriate mapping (vector dimensions, text analyzers)
- Backend abstraction: The DocStoreConnection interface allows swapping between Elasticsearch and Infinity without code changes