Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Heuristic:PacktPublishing LLM Engineers Handbook Temperature Selection By Task

From Leeroopedia




Knowledge Sources
Domains LLMs, Prompt_Engineering, Optimization
Last Updated 2026-02-08 08:00 GMT

Overview

Task-specific temperature selection: 0.0 for metadata extraction and query expansion, 0.01 for production inference, 0.7 for dataset generation, 0.8 for evaluation, and 0.9 for LLM-as-judge scoring.

Description

This heuristic documents the intentional temperature choices across different LLM call sites in the project. Each temperature value reflects the task's tolerance for variation: structured extraction tasks use zero temperature for deterministic outputs, production inference uses near-zero for consistency, and creative generation tasks use higher temperatures to encourage diversity. The LLM-as-judge uses the highest temperature (0.9) to avoid systematic bias in evaluations.

Usage

Use this heuristic when adding new LLM API calls to the project or debugging unexpected LLM behavior. The temperature setting is the single most impactful parameter for controlling output quality and consistency. Choosing the wrong temperature for a task is a common source of bugs.

The Insight (Rule of Thumb)

  • Action: Select temperature based on the task type:
  • Value:
Task Temperature Rationale
Self-query metadata extraction 0.0 Must produce consistent, parseable JSON
Query expansion 0.0 Must produce deterministic search reformulations
Production RAG inference 0.01 Minimal variation for user-facing responses
Dataset generation (instruction) 0.7 Needs creative but grounded responses
vLLM batch evaluation 0.8 Allows diverse model outputs for fair comparison
LLM-as-judge scoring 0.9 High variation prevents systematic scoring bias
  • Trade-off: Lower temperatures reduce output diversity (bad for generation tasks), while higher temperatures increase randomness (bad for extraction tasks). The 0.01 production setting is a compromise: deterministic enough for consistency but not exactly 0, which can cause degenerate repetition in some models.

Reasoning

Metadata extraction and query expansion produce structured outputs (JSON, keyword lists) that downstream components parse programmatically. Any randomness here causes parsing failures or inconsistent retrieval results. Dataset generation needs diversity to avoid repetitive training data, so 0.7 provides good variety while staying grounded in the source context. The LLM-as-judge temperature of 0.9 is deliberately high: at low temperatures, judges tend to converge on the same scores for similar-quality outputs, reducing the signal-to-noise ratio. The 0.01 production inference temperature avoids the "temperature 0 repetition trap" where some models get stuck in loops.

Self-query temperature from `llm_engineering/application/rag/self_query.py:21`:

model = ChatOpenAI(model=settings.OPENAI_MODEL_ID, api_key=settings.OPENAI_API_KEY, temperature=0)

Query expansion temperature from `llm_engineering/application/rag/query_expanison.py:14-22`:

model = ChatOpenAI(model=settings.OPENAI_MODEL_ID, api_key=settings.OPENAI_API_KEY, temperature=0)

Production inference temperature from `llm_engineering/settings.py:58`:

TEMPERATURE_INFERENCE: float = 0.01

Dataset generation temperature from `llm_engineering/application/dataset/generation.py:117-122`:

llm = ChatOpenAI(
    model=settings.OPENAI_MODEL_ID,
    api_key=settings.OPENAI_API_KEY,
    max_tokens=2000 if cls.dataset_type == DatasetType.PREFERENCE else 1200,
    temperature=0.7,
)

vLLM evaluation temperature from `llm_engineering/model/evaluation/evaluate.py:42`:

sampling_params = SamplingParams(temperature=0.8, top_p=0.95, min_p=0.05, max_tokens=2048)

LLM-as-judge temperature from `llm_engineering/model/evaluation/evaluate.py:90-102`:

completion = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[...],
    response_format={"type": "json_object"},
    max_tokens=1000,
    temperature=0.9,
)

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment