Heuristic:Vibrantlabsai Ragas Temperature Sampling Strategy
| Knowledge Sources | |
|---|---|
| Domains | LLM_Evaluation, Optimization |
| Last Updated | 2026-02-12 10:00 GMT |
Overview
Temperature selection heuristic that uses low temperature (0.01) for deterministic single completions and moderate temperature (0.3) for diverse multi-sample generation.
Description
The Ragas LLM abstraction (`BaseRagasLLM`) implements a dynamic temperature strategy based on the number of completions requested (`n` parameter). When generating a single completion (`n=1`), a near-zero temperature of 0.01 is used to ensure deterministic, reproducible outputs. When multiple completions are requested (`n>1`), a higher temperature of 0.3 is used to introduce diversity across samples. This balances reproducibility for evaluation with variety for techniques like answer relevancy that generate multiple question variants.
Usage
Use this heuristic when configuring LLM calls in Ragas metrics. The temperature is automatically applied when `temperature=None` is passed to the `generate()` method. Override the default by explicitly passing a temperature value. This is particularly relevant for:
- Single-sample metrics (Faithfulness, ContextPrecision): Use default low temperature for consistency.
- Multi-sample metrics (AnswerRelevancy): Benefit from higher temperature for diversity.
The Insight (Rule of Thumb)
- Action: Pass `temperature=None` to use the adaptive strategy, or set explicitly to override.
- Value: `n=1` → `temperature=0.01`, `n>1` → `temperature=0.3`.
- Trade-off: Low temperature gives reproducibility but may miss edge cases; higher temperature gives diversity but reduces consistency between runs.
Reasoning
LLM-based evaluation metrics need deterministic outputs for reproducibility. A temperature of 0.01 (not exactly 0, as some providers reject 0) ensures near-identical outputs across runs. For metrics like AnswerRelevancy that generate multiple question variants from an answer and measure cosine similarity to the original, diversity in generated questions is desirable, hence the 0.3 temperature. The 0.3 value is a conservative choice that provides variety without generating incoherent outputs.
Code Evidence
Temperature selection from `src/ragas/llms/base.py:71-73`:
def get_temperature(self, n: int) -> float:
"""Return the temperature to use for completion based on n."""
return 0.3 if n > 1 else 0.01
Applied during generation from `src/ragas/llms/base.py:110-111`:
if temperature is None:
temperature = self.get_temperature(n)