Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Principle:Infiniflow Ragflow Document Store Indexing

From Leeroopedia
Knowledge Sources
Domains RAG, Information_Retrieval, Data_Engineering
Last Updated 2026-02-12 06:00 GMT

Overview

A data persistence pattern that bulk-inserts processed chunks with embeddings into a pluggable document store for search indexing.

Description

Document Store Indexing is the final step in the processing pipeline where fully processed chunks (with text, embeddings, and tokenized keywords) are bulk-inserted into the document store. RAGFlow supports multiple backends: Elasticsearch, Infinity, OpenSearch, and OceanBase. The index is partitioned by tenant (ragflow_{tenant_id}) with dataset_id as a sub-partition. The abstract DocStoreConnection interface ensures backend-agnostic operation.

Usage

Operates automatically at the end of document processing. The backend is selected via the DOC_ENGINE environment variable.

Theoretical Basis

Efficient indexing requires:

  • Bulk operations: Inserting chunks in batches (configurable via DOC_BULK_SIZE, default 4) reduces network overhead
  • Index auto-creation: If the index doesn't exist, it's created with the appropriate mapping (vector dimensions, text analyzers)
  • Backend abstraction: The DocStoreConnection interface allows swapping between Elasticsearch and Infinity without code changes

Related Pages

Implemented By

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment