Heuristic:Explodinggradients Ragas Embedding Batch Size Tuning
| Knowledge Sources | |
|---|---|
| Domains | Optimization, LLM_Evaluation |
| Last Updated | 2026-02-10 12:00 GMT |
Overview
Provider-specific embedding batch size tuning: OpenAI supports 100, Cohere 96, HuggingFace 32, Google/Vertex AI only 5, and unknown providers default to 10.
Description
Ragas includes a `get_optimal_batch_size()` function that returns provider-tuned batch sizes for embedding API calls. These values are derived from each provider's documented API limits and empirical testing. Using the wrong batch size can cause API errors (too large) or unnecessary round-trips (too small). The function auto-detects the provider from the embedding model configuration.
Usage
Apply this heuristic when using embedding-based metrics (SemanticSimilarity, AnswerRelevancy, etc.) or when processing large datasets where embedding throughput matters. The auto-detection handles most cases, but you can override `batch_size` on embedding providers if needed.
The Insight (Rule of Thumb)
- Action: Use the default `get_optimal_batch_size()` auto-detection, or override `batch_size` on embedding provider constructors.
- Values:
- OpenAI: 100 (supports large batches)
- Cohere: 96 (documented limit)
- HuggingFace: 32 (default for local models)
- Google / Vertex AI: 5 (very conservative)
- Unknown providers: 10 (safe default)
- Trade-off: Too large = API errors or timeouts. Too small = more round-trips and slower throughput.
- HHEM-specific: The `FaithfulnesswithHHEM` metric uses a separate `batch_size=10` for NLI classification pairs, unrelated to embedding batch sizes.
Reasoning
Each embedding provider has different rate limits and payload size constraints. Google/Vertex AI is notably conservative (batch of 5) because its API imposes strict per-request character limits. OpenAI's embedding API is optimized for high throughput and handles batches of 100 efficiently. The conservative default of 10 for unknown providers prevents failures while still batching multiple texts per call.
Code Evidence
Provider-specific batch sizes from `src/ragas/embeddings/utils.py:108-130`:
def get_optimal_batch_size(provider: str, model: str) -> int:
provider_lower = provider.lower()
if "openai" in provider_lower:
return 100 # OpenAI supports large batches
elif "cohere" in provider_lower:
return 96 # Cohere's documented limit
elif "google" in provider_lower or "vertex" in provider_lower:
return 5 # Google/Vertex AI is more conservative
elif "huggingface" in provider_lower:
return 32 # HuggingFace default
else:
return 10 # Conservative default for unknown providers
HHEM batch size from `src/ragas/metrics/_faithfulness.py:221`:
class FaithfulnesswithHHEM(Faithfulness):
batch_size: int = 10
HuggingFace embedding batch_size from `src/ragas/embeddings/huggingface_provider.py:28`:
batch_size: int = 32,