Heuristic:Explodinggradients Ragas Embedding Batch Size Tuning

Knowledge Sources	Ragas Internal
Domains	Optimization, LLM_Evaluation
Last Updated	2026-02-10 12:00 GMT

Overview

Provider-specific embedding batch size tuning: OpenAI supports 100, Cohere 96, HuggingFace 32, Google/Vertex AI only 5, and unknown providers default to 10.

Description

Ragas includes a `get_optimal_batch_size()` function that returns provider-tuned batch sizes for embedding API calls. These values are derived from each provider's documented API limits and empirical testing. Using the wrong batch size can cause API errors (too large) or unnecessary round-trips (too small). The function auto-detects the provider from the embedding model configuration.

Usage

Apply this heuristic when using embedding-based metrics (SemanticSimilarity, AnswerRelevancy, etc.) or when processing large datasets where embedding throughput matters. The auto-detection handles most cases, but you can override `batch_size` on embedding providers if needed.

The Insight (Rule of Thumb)

Action: Use the default `get_optimal_batch_size()` auto-detection, or override `batch_size` on embedding provider constructors.
Values:
- OpenAI: 100 (supports large batches)
- Cohere: 96 (documented limit)
- HuggingFace: 32 (default for local models)
- Google / Vertex AI: 5 (very conservative)
- Unknown providers: 10 (safe default)
Trade-off: Too large = API errors or timeouts. Too small = more round-trips and slower throughput.
HHEM-specific: The `FaithfulnesswithHHEM` metric uses a separate `batch_size=10` for NLI classification pairs, unrelated to embedding batch sizes.

Reasoning

Each embedding provider has different rate limits and payload size constraints. Google/Vertex AI is notably conservative (batch of 5) because its API imposes strict per-request character limits. OpenAI's embedding API is optimized for high throughput and handles batches of 100 efficiently. The conservative default of 10 for unknown providers prevents failures while still batching multiple texts per call.

Code Evidence

Provider-specific batch sizes from `src/ragas/embeddings/utils.py:108-130`:

def get_optimal_batch_size(provider: str, model: str) -> int:
    provider_lower = provider.lower()

    if "openai" in provider_lower:
        return 100  # OpenAI supports large batches
    elif "cohere" in provider_lower:
        return 96  # Cohere's documented limit
    elif "google" in provider_lower or "vertex" in provider_lower:
        return 5  # Google/Vertex AI is more conservative
    elif "huggingface" in provider_lower:
        return 32  # HuggingFace default
    else:
        return 10  # Conservative default for unknown providers

HHEM batch size from `src/ragas/metrics/_faithfulness.py:221`:

class FaithfulnesswithHHEM(Faithfulness):
    batch_size: int = 10

HuggingFace embedding batch_size from `src/ragas/embeddings/huggingface_provider.py:28`:

batch_size: int = 32,

Related Pages

Implementation:Explodinggradients_Ragas_Llm_Factory

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment