Heuristic:Openai Whisper Compression Ratio Threshold

Knowledge Sources	OpenAI Whisper
Domains	Decoding, Quality_Control
Last Updated	2025-06-25 00:00 GMT

Overview

Gzip compression ratio threshold of 2.4 used to detect repetitive or degenerate decoder output, triggering temperature fallback for re-decoding.

Description

Whisper computes the gzip compression ratio of decoded text as a proxy for detecting repetitive output. When an autoregressive decoder enters a repetition loop, the output text becomes highly compressible. A compression ratio above 2.4 indicates that the text contains significant repetition, and the decoding result is considered failed. This triggers the temperature fallback mechanism to retry with higher randomness.

Usage

Use this heuristic to detect and reject repetitive transcriptions. The threshold is configurable via the `compression_ratio_threshold` parameter in `transcribe()`. Set to `None` to disable the check. Lower values are more aggressive at rejecting repetition; higher values are more permissive.

The Insight (Rule of Thumb)

Action: Set `compression_ratio_threshold=2.4` (default) in `transcribe()`.
Value: 2.4 — text with gzip compression ratio above this is considered degenerate.
Trade-off: Too low triggers unnecessary fallbacks on legitimate repetitive speech; too high allows repetitive garbage through.

Reasoning

Repetitive text compresses very well because gzip exploits repeated patterns. Normal speech transcriptions have compression ratios below 2.4. When the decoder loops on patterns like "the the the the...", the compression ratio spikes well above this threshold. The value 2.4 was empirically tuned by OpenAI to balance sensitivity and specificity.

Code evidence from `whisper/transcribe.py:44,204-208`:

compression_ratio_threshold: Optional[float] = 2.4,

needs_fallback = False
if (
    compression_ratio_threshold is not None
    and decode_result.compression_ratio > compression_ratio_threshold
):
    needs_fallback = True  # too repetitive

Compression ratio computation from `whisper/utils.py`:

def compression_ratio(text) -> float:
    text_bytes = text.encode("utf-8")
    return len(text_bytes) / len(zlib.compress(text_bytes))

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment