Heuristic:Openai Whisper Compression Ratio Threshold
| Knowledge Sources | |
|---|---|
| Domains | Decoding, Quality_Control |
| Last Updated | 2025-06-25 00:00 GMT |
Overview
Gzip compression ratio threshold of 2.4 used to detect repetitive or degenerate decoder output, triggering temperature fallback for re-decoding.
Description
Whisper computes the gzip compression ratio of decoded text as a proxy for detecting repetitive output. When an autoregressive decoder enters a repetition loop, the output text becomes highly compressible. A compression ratio above 2.4 indicates that the text contains significant repetition, and the decoding result is considered failed. This triggers the temperature fallback mechanism to retry with higher randomness.
Usage
Use this heuristic to detect and reject repetitive transcriptions. The threshold is configurable via the `compression_ratio_threshold` parameter in `transcribe()`. Set to `None` to disable the check. Lower values are more aggressive at rejecting repetition; higher values are more permissive.
The Insight (Rule of Thumb)
- Action: Set `compression_ratio_threshold=2.4` (default) in `transcribe()`.
- Value: 2.4 — text with gzip compression ratio above this is considered degenerate.
- Trade-off: Too low triggers unnecessary fallbacks on legitimate repetitive speech; too high allows repetitive garbage through.
Reasoning
Repetitive text compresses very well because gzip exploits repeated patterns. Normal speech transcriptions have compression ratios below 2.4. When the decoder loops on patterns like "the the the the...", the compression ratio spikes well above this threshold. The value 2.4 was empirically tuned by OpenAI to balance sensitivity and specificity.
Code evidence from `whisper/transcribe.py:44,204-208`:
compression_ratio_threshold: Optional[float] = 2.4,
needs_fallback = False
if (
compression_ratio_threshold is not None
and decode_result.compression_ratio > compression_ratio_threshold
):
needs_fallback = True # too repetitive
Compression ratio computation from `whisper/utils.py`:
def compression_ratio(text) -> float:
text_bytes = text.encode("utf-8")
return len(text_bytes) / len(zlib.compress(text_bytes))