Heuristic:Openai Whisper Log Probability Threshold
| Knowledge Sources | |
|---|---|
| Domains | Decoding, Quality_Control |
| Last Updated | 2025-06-25 00:00 GMT |
Overview
Average log probability threshold of -1.0 used to detect low-confidence decoder output, triggering temperature fallback for re-decoding.
Description
Whisper computes the average log probability across all sampled tokens in a decoded segment. When this value falls below -1.0, it indicates the model has low confidence in its output — the generated tokens are unlikely under the model's distribution. This triggers the temperature fallback mechanism to retry decoding with higher randomness.
Usage
Use this heuristic to reject low-confidence transcriptions. The threshold is configurable via the `logprob_threshold` parameter in `transcribe()`. Set to `None` to disable the check. More negative values are more permissive; less negative values are stricter.
The Insight (Rule of Thumb)
- Action: Set `logprob_threshold=-1.0` (default) in `transcribe()`.
- Value: -1.0 — average log probability below this is considered a failed decode.
- Trade-off: Too high (e.g., -0.5) rejects many valid transcriptions; too low (e.g., -2.0) allows garbage through. The threshold also interacts with the no-speech detector.
- Interaction: When `no_speech_prob > no_speech_threshold` AND `avg_logprob < logprob_threshold`, the segment is classified as silence rather than a failed decode, and no fallback is triggered.
Reasoning
Average log probability reflects the model's confidence in its output. For clean speech that the model handles well, average log probabilities are typically above -0.5. For difficult audio or hallucinated output, the probabilities drop significantly. The -1.0 threshold catches clear model failures while tolerating moderate difficulty.
Code evidence from `whisper/transcribe.py:45,209-213`:
logprob_threshold: Optional[float] = -1.0,
if (
logprob_threshold is not None
and decode_result.avg_logprob < logprob_threshold
):
needs_fallback = True # average log probability is too low
Silence detection interaction from `whisper/transcribe.py:214-220`:
if (
no_speech_threshold is not None
and decode_result.no_speech_prob > no_speech_threshold
and logprob_threshold is not None
and decode_result.avg_logprob < logprob_threshold
):
needs_fallback = False # silence