Environment:Openai Whisper PyTorch CUDA

Knowledge Sources	OpenAI Whisper PyTorch
Domains	Infrastructure, Speech_Recognition
Last Updated	2025-06-25 00:00 GMT

Overview

Python 3.8+ environment with PyTorch and optional CUDA GPU acceleration for running OpenAI Whisper speech recognition models.

Description

This environment provides the core runtime for Whisper inference. It requires Python 3.8 or newer with PyTorch installed. When a CUDA-capable GPU is available, the model is automatically placed on the GPU for accelerated inference using FP16 precision. On CPU, the model runs in FP32 mode. The environment also includes tiktoken for fast tokenization, numpy, tqdm for progress display, and more-itertools.

Usage

Use this environment for all Whisper operations: model loading, audio preprocessing (mel spectrogram computation), language detection, decoding, and full transcription. Every Whisper Implementation page that involves model inference requires this environment.

System Requirements

Category	Requirement	Notes
OS	Linux, macOS, or Windows	All platforms supported by PyTorch
Hardware	CPU (minimum) or NVIDIA GPU (recommended)	GPU enables FP16 inference and significant speedup
VRAM	1 GB (tiny) to 10 GB (large)	See model size table below
Disk	~3 GB for largest model checkpoint	Cached in ~/.cache/whisper by default
Network	Internet access for first model download	Models are downloaded from Azure CDN

Model VRAM Requirements:

Model	Parameters	Required VRAM
tiny / tiny.en	39 M	~1 GB
base / base.en	74 M	~1 GB
small / small.en	244 M	~2 GB
medium / medium.en	769 M	~5 GB
large-v1 / large-v2 / large-v3	1550 M	~10 GB
turbo	809 M	~6 GB

Dependencies

System Packages

Python >= 3.8 (3.8 through 3.13 supported)
CUDA toolkit (optional, for GPU acceleration)

Python Packages

`torch` (any recent version; developed with 1.10.1+)
`numpy`
`tiktoken`
`tqdm`
`more-itertools`

Credentials

No API keys or credentials are required. Model checkpoints are downloaded from public Azure CDN endpoints without authentication.

The download cache directory can be overridden via the XDG_CACHE_HOME environment variable (defaults to ~/.cache).

Quick Install

pip install openai-whisper

Or install from source:

pip install git+https://github.com/openai/whisper.git

Code Evidence

Device auto-detection from `whisper/__init__.py:130-131`:

if device is None:
    device = "cuda" if torch.cuda.is_available() else "cpu"

FP16/FP32 dtype selection from `whisper/transcribe.py:127-133`:

dtype = torch.float16 if decode_options.get("fp16", True) else torch.float32
if model.device == torch.device("cpu"):
    if torch.cuda.is_available():
        warnings.warn("Performing inference on CPU when CUDA is available")
    if dtype == torch.float16:
        warnings.warn("FP16 is not supported on CPU; using FP32 instead")
        dtype = torch.float32

Checkpoint integrity verification from `whisper/__init__.py:90-93`:

if hashlib.sha256(model_bytes).hexdigest() != expected_sha256:
    raise RuntimeError(
        "Model has been downloaded but the SHA256 checksum does not not match. Please retry loading the model."
    )

SDPA availability check from `whisper/model.py:16-22`:

try:
    from torch.nn.functional import scaled_dot_product_attention
    SDPA_AVAILABLE = True
except (ImportError, RuntimeError, OSError):
    scaled_dot_product_attention = None
    SDPA_AVAILABLE = False

Common Errors

Error Message	Cause	Solution
`FP16 is not supported on CPU; using FP32 instead`	Running on CPU with default fp16=True	Pass `fp16=False` or use a CUDA GPU
`Performing inference on CPU when CUDA is available`	Model loaded on CPU despite GPU being available	Omit the `device` parameter or set `device="cuda"`
`Model has been downloaded but the SHA256 checksum does not not match`	Corrupted download	Delete cached file in `~/.cache/whisper` and retry
`RuntimeError: Model {name} not found`	Invalid model name	Use one of: tiny, base, small, medium, large-v1, large-v2, large-v3, turbo (with optional .en suffix)

Compatibility Notes

CPU inference: FP16 is not supported on CPU. The code automatically falls back to FP32 with a warning.
CUDA: Any NVIDIA GPU supported by PyTorch works. SDPA (Scaled Dot-Product Attention) is used when available (PyTorch >= 2.0) for faster attention computation.
macOS: Works on CPU. MPS (Apple Silicon GPU) is not explicitly handled by Whisper — the device parameter must be set manually if desired.
Windows: Supported. Rust compiler may be needed for tiktoken installation if no pre-built wheel is available.
PyTorch version: `weights_only=True` is passed to `torch.load()` for PyTorch >= 1.13 for security.

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment