Heuristic:Openai Whisper Log Probability Threshold

Knowledge Sources	OpenAI Whisper
Domains	Decoding, Quality_Control
Last Updated	2025-06-25 00:00 GMT

Overview

Average log probability threshold of -1.0 used to detect low-confidence decoder output, triggering temperature fallback for re-decoding.

Description

Whisper computes the average log probability across all sampled tokens in a decoded segment. When this value falls below -1.0, it indicates the model has low confidence in its output — the generated tokens are unlikely under the model's distribution. This triggers the temperature fallback mechanism to retry decoding with higher randomness.

Usage

Use this heuristic to reject low-confidence transcriptions. The threshold is configurable via the `logprob_threshold` parameter in `transcribe()`. Set to `None` to disable the check. More negative values are more permissive; less negative values are stricter.

The Insight (Rule of Thumb)

Action: Set `logprob_threshold=-1.0` (default) in `transcribe()`.
Value: -1.0 — average log probability below this is considered a failed decode.
Trade-off: Too high (e.g., -0.5) rejects many valid transcriptions; too low (e.g., -2.0) allows garbage through. The threshold also interacts with the no-speech detector.
Interaction: When `no_speech_prob > no_speech_threshold` AND `avg_logprob < logprob_threshold`, the segment is classified as silence rather than a failed decode, and no fallback is triggered.

Reasoning

Average log probability reflects the model's confidence in its output. For clean speech that the model handles well, average log probabilities are typically above -0.5. For difficult audio or hallucinated output, the probabilities drop significantly. The -1.0 threshold catches clear model failures while tolerating moderate difficulty.

Code evidence from `whisper/transcribe.py:45,209-213`:

logprob_threshold: Optional[float] = -1.0,

if (
    logprob_threshold is not None
    and decode_result.avg_logprob < logprob_threshold
):
    needs_fallback = True  # average log probability is too low

Silence detection interaction from `whisper/transcribe.py:214-220`:

if (
    no_speech_threshold is not None
    and decode_result.no_speech_prob > no_speech_threshold
    and logprob_threshold is not None
    and decode_result.avg_logprob < logprob_threshold
):
    needs_fallback = False  # silence

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment