Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Heuristic:Vibrantlabsai Ragas Temperature Sampling Strategy

From Leeroopedia
Knowledge Sources
Domains LLM_Evaluation, Optimization
Last Updated 2026-02-12 10:00 GMT

Overview

Temperature selection heuristic that uses low temperature (0.01) for deterministic single completions and moderate temperature (0.3) for diverse multi-sample generation.

Description

The Ragas LLM abstraction (`BaseRagasLLM`) implements a dynamic temperature strategy based on the number of completions requested (`n` parameter). When generating a single completion (`n=1`), a near-zero temperature of 0.01 is used to ensure deterministic, reproducible outputs. When multiple completions are requested (`n>1`), a higher temperature of 0.3 is used to introduce diversity across samples. This balances reproducibility for evaluation with variety for techniques like answer relevancy that generate multiple question variants.

Usage

Use this heuristic when configuring LLM calls in Ragas metrics. The temperature is automatically applied when `temperature=None` is passed to the `generate()` method. Override the default by explicitly passing a temperature value. This is particularly relevant for:

  • Single-sample metrics (Faithfulness, ContextPrecision): Use default low temperature for consistency.
  • Multi-sample metrics (AnswerRelevancy): Benefit from higher temperature for diversity.

The Insight (Rule of Thumb)

  • Action: Pass `temperature=None` to use the adaptive strategy, or set explicitly to override.
  • Value: `n=1` → `temperature=0.01`, `n>1` → `temperature=0.3`.
  • Trade-off: Low temperature gives reproducibility but may miss edge cases; higher temperature gives diversity but reduces consistency between runs.

Reasoning

LLM-based evaluation metrics need deterministic outputs for reproducibility. A temperature of 0.01 (not exactly 0, as some providers reject 0) ensures near-identical outputs across runs. For metrics like AnswerRelevancy that generate multiple question variants from an answer and measure cosine similarity to the original, diversity in generated questions is desirable, hence the 0.3 temperature. The 0.3 value is a conservative choice that provides variety without generating incoherent outputs.

Code Evidence

Temperature selection from `src/ragas/llms/base.py:71-73`:

def get_temperature(self, n: int) -> float:
    """Return the temperature to use for completion based on n."""
    return 0.3 if n > 1 else 0.01

Applied during generation from `src/ragas/llms/base.py:110-111`:

if temperature is None:
    temperature = self.get_temperature(n)

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment