Heuristic:Vibrantlabsai Ragas Reasoning Model Parameter Constraints
| Knowledge Sources | |
|---|---|
| Domains | LLM_Evaluation, Debugging |
| Last Updated | 2026-02-12 10:00 GMT |
Overview
Parameter constraint heuristic for OpenAI reasoning models (o-series, GPT-5+): force temperature to 1.0, remove top_p, and map max_tokens to max_completion_tokens.
Description
OpenAI reasoning models (o1, o3, etc.) and newer GPT-5+ models have strict API parameter constraints that differ from standard chat models. They require `temperature=1.0` (the only supported value), do not accept `top_p`, and use `max_completion_tokens` instead of `max_tokens`. Ragas auto-detects these models via pattern matching on the model name and transparently remaps parameters to avoid API errors.
Usage
Use this heuristic when:
- Using o-series models (o1, o1-mini, o3, etc.) for evaluation — parameters are auto-remapped.
- Using GPT-5+ models — Same auto-remapping applies.
- Debugging API errors like "temperature must be 1.0" or "top_p is not supported" — the model is a reasoning model that needs parameter constraints.
- Wrapping via LangchainLLMWrapper — Set `bypass_temperature=True` and `bypass_n=True` manually for reasoning models.
The Insight (Rule of Thumb)
- Action: When using reasoning models, Ragas automatically enforces: `temperature=1.0`, removes `top_p`, maps `max_tokens` → `max_completion_tokens`.
- Value: Temperature forced to exactly `1.0`; no other value accepted by the API.
- Trade-off: Loss of temperature control for determinism. Reasoning models produce varied outputs regardless, as they use internal chain-of-thought.
Reasoning
OpenAI reasoning models (o1, o3-mini, etc.) use internal chain-of-thought reasoning that is incompatible with temperature sampling. The API enforces `temperature=1.0` and rejects `top_p`. Additionally, these models use a different token budget parameter (`max_completion_tokens`) that includes both reasoning tokens and output tokens. Sending the wrong parameters results in an API error, so Ragas auto-detects the model via string pattern matching and remaps before the API call.
The detection uses a pattern-based approach rather than a hardcoded list to be future-proof. It covers:
- O-series: `o1` through `o9` (with variants like `o1-mini`, `o3-2025-01-31`)
- GPT-5+: `gpt-5` through `gpt-19` (with variants)
- Special: `codex-mini`
Code Evidence
Reasoning model detection from `src/ragas/llms/base.py:872-904`:
def is_reasoning_model(model_str: str) -> bool:
"""Check if model is a reasoning model requiring max_completion_tokens."""
# O-series reasoning models (o1, o1-mini, o2, o3, ...)
# TODO: Update to support o10+ when OpenAI releases models beyond o9
if (
len(model_str) >= 2
and model_str[0] == "o"
and model_str[1] in "123456789"
):
if len(model_str) == 2 or model_str[2] in ("-", "_"):
return True
# GPT-5 and newer (gpt-5, gpt-6, ..., gpt-19)
# TODO: Update to support gpt-20+ when OpenAI releases models beyond gpt-19
if model_str.startswith("gpt-"):
version_str = model_str[4:].split("-")[0].split("_")[0]
try:
version = int(version_str)
if 5 <= version <= 19:
return True
except ValueError:
pass
if model_str == "codex-mini":
return True
return False
Parameter remapping from `src/ragas/llms/base.py:908-918`:
# If max_tokens is provided and model requires max_completion_tokens, map it
if requires_max_completion_tokens and "max_tokens" in mapped_args:
mapped_args["max_completion_tokens"] = mapped_args.pop("max_tokens")
# GPT-5 and o-series models have strict parameter requirements:
# 1. Temperature must be exactly 1.0 (only supported value)
# 2. top_p parameter is not supported and must be removed
if requires_max_completion_tokens:
mapped_args["temperature"] = 1.0
mapped_args.pop("top_p", None)
LangchainLLMWrapper bypass flags from `src/ragas/llms/base.py:172-175`:
# Certain LLMs (e.g., OpenAI o1 series) do not support temperature
self.bypass_temperature = bypass_temperature
# Certain reasoning LLMs (e.g., OpenAI o1 series) do not support n parameter
self.bypass_n = bypass_n