Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Environment:Openai Whisper PyTorch CUDA

From Leeroopedia
Knowledge Sources
Domains Infrastructure, Speech_Recognition
Last Updated 2025-06-25 00:00 GMT

Overview

Python 3.8+ environment with PyTorch and optional CUDA GPU acceleration for running OpenAI Whisper speech recognition models.

Description

This environment provides the core runtime for Whisper inference. It requires Python 3.8 or newer with PyTorch installed. When a CUDA-capable GPU is available, the model is automatically placed on the GPU for accelerated inference using FP16 precision. On CPU, the model runs in FP32 mode. The environment also includes tiktoken for fast tokenization, numpy, tqdm for progress display, and more-itertools.

Usage

Use this environment for all Whisper operations: model loading, audio preprocessing (mel spectrogram computation), language detection, decoding, and full transcription. Every Whisper Implementation page that involves model inference requires this environment.

System Requirements

Category Requirement Notes
OS Linux, macOS, or Windows All platforms supported by PyTorch
Hardware CPU (minimum) or NVIDIA GPU (recommended) GPU enables FP16 inference and significant speedup
VRAM 1 GB (tiny) to 10 GB (large) See model size table below
Disk ~3 GB for largest model checkpoint Cached in ~/.cache/whisper by default
Network Internet access for first model download Models are downloaded from Azure CDN

Model VRAM Requirements:

Model Parameters Required VRAM
tiny / tiny.en 39 M ~1 GB
base / base.en 74 M ~1 GB
small / small.en 244 M ~2 GB
medium / medium.en 769 M ~5 GB
large-v1 / large-v2 / large-v3 1550 M ~10 GB
turbo 809 M ~6 GB

Dependencies

System Packages

  • Python >= 3.8 (3.8 through 3.13 supported)
  • CUDA toolkit (optional, for GPU acceleration)

Python Packages

  • `torch` (any recent version; developed with 1.10.1+)
  • `numpy`
  • `tiktoken`
  • `tqdm`
  • `more-itertools`

Credentials

No API keys or credentials are required. Model checkpoints are downloaded from public Azure CDN endpoints without authentication.

The download cache directory can be overridden via the XDG_CACHE_HOME environment variable (defaults to ~/.cache).

Quick Install

pip install openai-whisper

Or install from source:

pip install git+https://github.com/openai/whisper.git

Code Evidence

Device auto-detection from `whisper/__init__.py:130-131`:

if device is None:
    device = "cuda" if torch.cuda.is_available() else "cpu"

FP16/FP32 dtype selection from `whisper/transcribe.py:127-133`:

dtype = torch.float16 if decode_options.get("fp16", True) else torch.float32
if model.device == torch.device("cpu"):
    if torch.cuda.is_available():
        warnings.warn("Performing inference on CPU when CUDA is available")
    if dtype == torch.float16:
        warnings.warn("FP16 is not supported on CPU; using FP32 instead")
        dtype = torch.float32

Checkpoint integrity verification from `whisper/__init__.py:90-93`:

if hashlib.sha256(model_bytes).hexdigest() != expected_sha256:
    raise RuntimeError(
        "Model has been downloaded but the SHA256 checksum does not not match. Please retry loading the model."
    )

SDPA availability check from `whisper/model.py:16-22`:

try:
    from torch.nn.functional import scaled_dot_product_attention
    SDPA_AVAILABLE = True
except (ImportError, RuntimeError, OSError):
    scaled_dot_product_attention = None
    SDPA_AVAILABLE = False

Common Errors

Error Message Cause Solution
`FP16 is not supported on CPU; using FP32 instead` Running on CPU with default fp16=True Pass `fp16=False` or use a CUDA GPU
`Performing inference on CPU when CUDA is available` Model loaded on CPU despite GPU being available Omit the `device` parameter or set `device="cuda"`
`Model has been downloaded but the SHA256 checksum does not not match` Corrupted download Delete cached file in `~/.cache/whisper` and retry
`RuntimeError: Model {name} not found` Invalid model name Use one of: tiny, base, small, medium, large-v1, large-v2, large-v3, turbo (with optional .en suffix)

Compatibility Notes

  • CPU inference: FP16 is not supported on CPU. The code automatically falls back to FP32 with a warning.
  • CUDA: Any NVIDIA GPU supported by PyTorch works. SDPA (Scaled Dot-Product Attention) is used when available (PyTorch >= 2.0) for faster attention computation.
  • macOS: Works on CPU. MPS (Apple Silicon GPU) is not explicitly handled by Whisper — the device parameter must be set manually if desired.
  • Windows: Supported. Rust compiler may be needed for tiktoken installation if no pre-built wheel is available.
  • PyTorch version: `weights_only=True` is passed to `torch.load()` for PyTorch >= 1.13 for security.

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment