Heuristic:Infiniflow Ragflow Embedding Batch Size Constraint

Knowledge Sources	RAGFlow RAGFlow Embedding Models
Domains	LLMs, Optimization
Last Updated	2026-02-12 06:00 GMT

Overview

Embedding batch size is capped at 16 across all providers (constrained by OpenAI API limits), with provider-specific token truncation limits ranging from 500 to 30,000 tokens per input.

Description

RAGFlow processes embeddings in batches of 16 texts at a time — a universal constraint driven by the OpenAI API's batch size limit. This batch size is used consistently across all embedding providers (OpenAI, DashScope, LocalAI, Cohere, HuggingFace, etc.) regardless of whether the provider supports larger batches. Each provider also has its own maximum token length for input truncation: OpenAI at 8,191 tokens, DashScope at 2,048, BAAI/bge-small at 500, and Qwen3-Embedding at 30,000. The `EMBEDDING_BATCH_SIZE` setting (default 16) controls how many chunks are sent to the embedding model per API call during document ingestion.

Usage

Use this heuristic when tuning document processing throughput. The batch size of 16 is a safe default; increasing it may cause API errors with some providers. The per-provider token truncation ensures inputs are not silently rejected.

The Insight (Rule of Thumb)

Action: Keep `EMBEDDING_BATCH_SIZE=16` (default). Do not exceed this for OpenAI-compatible APIs.
Value: batch_size=16 for API calls, truncation varies: 8191 (OpenAI), 2048 (DashScope), 500 (BGE-small), 30000 (Qwen3).
Trade-off: Smaller batch sizes increase API call count but reduce memory usage and risk of timeout. Larger batches are faster but may hit provider limits.

Reasoning

The OpenAI embedding API enforces a batch size limit of 16. Since RAGFlow supports switching between many embedding providers, using the most restrictive limit as the universal default ensures compatibility. The token truncation limits are provider-specific and documented in the MAX_TOKENS dictionary. For local models (via LocalAI), token counting may not work correctly, so RAGFlow falls back to reporting 1024 tokens as a conservative estimate.

Code Evidence from `rag/llm/embedding_model.py:55-74`:

class BuiltinEmbed(Base):
    MAX_TOKENS = {
        "Qwen/Qwen3-Embedding-0.6B": 30000,
        "BAAI/bge-m3": 8000,
        "BAAI/bge-small-en-v1.5": 500
    }

    def encode(self, texts: list):
        batch_size = 16

Global batch size setting from `common/settings.py:123`:

EMBEDDING_BATCH_SIZE: int = 16

OpenAI truncation from `rag/llm/embedding_model.py:101-103`:

batch_size = 16
texts = [truncate(t, 8191) for t in texts]

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment