Principle:Infiniflow Ragflow Document Store Indexing

Knowledge Sources	RAGFlow
Domains	RAG, Information_Retrieval, Data_Engineering
Last Updated	2026-02-12 06:00 GMT

Overview

A data persistence pattern that bulk-inserts processed chunks with embeddings into a pluggable document store for search indexing.

Description

Document Store Indexing is the final step in the processing pipeline where fully processed chunks (with text, embeddings, and tokenized keywords) are bulk-inserted into the document store. RAGFlow supports multiple backends: Elasticsearch, Infinity, OpenSearch, and OceanBase. The index is partitioned by tenant (ragflow_{tenant_id}) with dataset_id as a sub-partition. The abstract DocStoreConnection interface ensures backend-agnostic operation.

Usage

Operates automatically at the end of document processing. The backend is selected via the DOC_ENGINE environment variable.

Theoretical Basis

Efficient indexing requires:

Bulk operations: Inserting chunks in batches (configurable via DOC_BULK_SIZE, default 4) reduces network overhead
Index auto-creation: If the index doesn't exist, it's created with the appropriate mapping (vector dimensions, text analyzers)
Backend abstraction: The DocStoreConnection interface allows swapping between Elasticsearch and Infinity without code changes

Related Pages

Implemented By

Implementation:Infiniflow_Ragflow_DocStoreConn_Insert

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment