Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Heuristic:Infiniflow Ragflow Embedding Batch Size Constraint

From Leeroopedia
Knowledge Sources
Domains LLMs, Optimization
Last Updated 2026-02-12 06:00 GMT

Overview

Embedding batch size is capped at 16 across all providers (constrained by OpenAI API limits), with provider-specific token truncation limits ranging from 500 to 30,000 tokens per input.

Description

RAGFlow processes embeddings in batches of 16 texts at a time — a universal constraint driven by the OpenAI API's batch size limit. This batch size is used consistently across all embedding providers (OpenAI, DashScope, LocalAI, Cohere, HuggingFace, etc.) regardless of whether the provider supports larger batches. Each provider also has its own maximum token length for input truncation: OpenAI at 8,191 tokens, DashScope at 2,048, BAAI/bge-small at 500, and Qwen3-Embedding at 30,000. The `EMBEDDING_BATCH_SIZE` setting (default 16) controls how many chunks are sent to the embedding model per API call during document ingestion.

Usage

Use this heuristic when tuning document processing throughput. The batch size of 16 is a safe default; increasing it may cause API errors with some providers. The per-provider token truncation ensures inputs are not silently rejected.

The Insight (Rule of Thumb)

  • Action: Keep `EMBEDDING_BATCH_SIZE=16` (default). Do not exceed this for OpenAI-compatible APIs.
  • Value: batch_size=16 for API calls, truncation varies: 8191 (OpenAI), 2048 (DashScope), 500 (BGE-small), 30000 (Qwen3).
  • Trade-off: Smaller batch sizes increase API call count but reduce memory usage and risk of timeout. Larger batches are faster but may hit provider limits.

Reasoning

The OpenAI embedding API enforces a batch size limit of 16. Since RAGFlow supports switching between many embedding providers, using the most restrictive limit as the universal default ensures compatibility. The token truncation limits are provider-specific and documented in the MAX_TOKENS dictionary. For local models (via LocalAI), token counting may not work correctly, so RAGFlow falls back to reporting 1024 tokens as a conservative estimate.

Code Evidence from `rag/llm/embedding_model.py:55-74`:

class BuiltinEmbed(Base):
    MAX_TOKENS = {
        "Qwen/Qwen3-Embedding-0.6B": 30000,
        "BAAI/bge-m3": 8000,
        "BAAI/bge-small-en-v1.5": 500
    }

    def encode(self, texts: list):
        batch_size = 16

Global batch size setting from `common/settings.py:123`:

EMBEDDING_BATCH_SIZE: int = 16

OpenAI truncation from `rag/llm/embedding_model.py:101-103`:

batch_size = 16
texts = [truncate(t, 8191) for t in texts]

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment