Heuristic:Openai Whisper Temperature Fallback Strategy

Knowledge Sources	OpenAI Whisper
Domains	Decoding, Robustness
Last Updated	2025-06-25 00:00 GMT

Overview

Progressive temperature fallback strategy that retries decoding at increasing temperatures (0.0, 0.2, 0.4, 0.6, 0.8, 1.0) when quality thresholds are not met, improving transcription robustness.

Description

Whisper's transcription pipeline uses a multi-temperature fallback loop to handle difficult audio segments. It first attempts greedy decoding at temperature 0.0 (deterministic). If the result fails quality checks (compression ratio too high or average log probability too low), it retries with progressively higher temperatures, introducing more randomness into the sampling. This strategy provides a balance between deterministic accuracy for easy segments and stochastic recovery for difficult ones.

Usage

Use this heuristic when transcription quality is inconsistent across segments. The default temperature tuple `(0.0, 0.2, 0.4, 0.6, 0.8, 1.0)` works well for general use. For audio with frequent challenging segments (accented speech, background noise), the fallback mechanism activates automatically. Disable it by passing a single temperature value (e.g., `temperature=0.0`) for deterministic-only decoding.

The Insight (Rule of Thumb)

Action: Pass a tuple of temperatures to `transcribe()` via the `temperature` parameter.
Value: Default `(0.0, 0.2, 0.4, 0.6, 0.8, 1.0)` — six attempts from greedy to fully random.
Trade-off: Higher temperatures increase diversity but may reduce accuracy. The fallback loop adds latency for difficult segments (up to 6x decode time per segment in worst case).
Key detail: At temperature 0 (greedy), `beam_size` and `patience` are active but `best_of` is disabled. At temperature > 0 (sampling), `beam_size` and `patience` are disabled but `best_of` is active.

Reasoning

Autoregressive sequence-to-sequence models can get stuck in repetitive loops (high compression ratio) or produce low-confidence gibberish (low log probability) on difficult audio. Increasing temperature adds noise to the logit distribution, breaking out of degenerate modes. The progressive approach ensures easy segments are decoded quickly at temperature 0, while difficult segments get multiple chances with increasing randomness.

Code evidence from `whisper/transcribe.py:184-223`:

def decode_with_fallback(segment: torch.Tensor) -> DecodingResult:
    temperatures = (
        [temperature] if isinstance(temperature, (int, float)) else temperature
    )
    decode_result = None

    for t in temperatures:
        kwargs = {**decode_options}
        if t > 0:
            # disable beam_size and patience when t > 0
            kwargs.pop("beam_size", None)
            kwargs.pop("patience", None)
        else:
            # disable best_of when t == 0
            kwargs.pop("best_of", None)

        options = DecodingOptions(**kwargs, temperature=t)
        decode_result = model.decode(segment, options)

        needs_fallback = False
        if (
            compression_ratio_threshold is not None
            and decode_result.compression_ratio > compression_ratio_threshold
        ):
            needs_fallback = True  # too repetitive
        if (
            logprob_threshold is not None
            and decode_result.avg_logprob < logprob_threshold
        ):
            needs_fallback = True  # average log probability is too low
        if (
            no_speech_threshold is not None
            and decode_result.no_speech_prob > no_speech_threshold
            and logprob_threshold is not None
            and decode_result.avg_logprob < logprob_threshold
        ):
            needs_fallback = False  # silence
        if not needs_fallback:
            break

    return decode_result

Additionally, the prompt is reset when temperature exceeds 0.5, from `whisper/transcribe.py:503-505`:

if not condition_on_previous_text or result.temperature > 0.5:
    # do not feed the prompt tokens if a high temperature was used
    prompt_reset_since = len(all_tokens)

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment