Environment:Openai Whisper PyTorch CUDA
| Knowledge Sources | |
|---|---|
| Domains | Infrastructure, Speech_Recognition |
| Last Updated | 2025-06-25 00:00 GMT |
Overview
Python 3.8+ environment with PyTorch and optional CUDA GPU acceleration for running OpenAI Whisper speech recognition models.
Description
This environment provides the core runtime for Whisper inference. It requires Python 3.8 or newer with PyTorch installed. When a CUDA-capable GPU is available, the model is automatically placed on the GPU for accelerated inference using FP16 precision. On CPU, the model runs in FP32 mode. The environment also includes tiktoken for fast tokenization, numpy, tqdm for progress display, and more-itertools.
Usage
Use this environment for all Whisper operations: model loading, audio preprocessing (mel spectrogram computation), language detection, decoding, and full transcription. Every Whisper Implementation page that involves model inference requires this environment.
System Requirements
| Category | Requirement | Notes |
|---|---|---|
| OS | Linux, macOS, or Windows | All platforms supported by PyTorch |
| Hardware | CPU (minimum) or NVIDIA GPU (recommended) | GPU enables FP16 inference and significant speedup |
| VRAM | 1 GB (tiny) to 10 GB (large) | See model size table below |
| Disk | ~3 GB for largest model checkpoint | Cached in ~/.cache/whisper by default |
| Network | Internet access for first model download | Models are downloaded from Azure CDN |
Model VRAM Requirements:
| Model | Parameters | Required VRAM |
|---|---|---|
| tiny / tiny.en | 39 M | ~1 GB |
| base / base.en | 74 M | ~1 GB |
| small / small.en | 244 M | ~2 GB |
| medium / medium.en | 769 M | ~5 GB |
| large-v1 / large-v2 / large-v3 | 1550 M | ~10 GB |
| turbo | 809 M | ~6 GB |
Dependencies
System Packages
- Python >= 3.8 (3.8 through 3.13 supported)
- CUDA toolkit (optional, for GPU acceleration)
Python Packages
- `torch` (any recent version; developed with 1.10.1+)
- `numpy`
- `tiktoken`
- `tqdm`
- `more-itertools`
Credentials
No API keys or credentials are required. Model checkpoints are downloaded from public Azure CDN endpoints without authentication.
The download cache directory can be overridden via the XDG_CACHE_HOME environment variable (defaults to ~/.cache).
Quick Install
pip install openai-whisper
Or install from source:
pip install git+https://github.com/openai/whisper.git
Code Evidence
Device auto-detection from `whisper/__init__.py:130-131`:
if device is None:
device = "cuda" if torch.cuda.is_available() else "cpu"
FP16/FP32 dtype selection from `whisper/transcribe.py:127-133`:
dtype = torch.float16 if decode_options.get("fp16", True) else torch.float32
if model.device == torch.device("cpu"):
if torch.cuda.is_available():
warnings.warn("Performing inference on CPU when CUDA is available")
if dtype == torch.float16:
warnings.warn("FP16 is not supported on CPU; using FP32 instead")
dtype = torch.float32
Checkpoint integrity verification from `whisper/__init__.py:90-93`:
if hashlib.sha256(model_bytes).hexdigest() != expected_sha256:
raise RuntimeError(
"Model has been downloaded but the SHA256 checksum does not not match. Please retry loading the model."
)
SDPA availability check from `whisper/model.py:16-22`:
try:
from torch.nn.functional import scaled_dot_product_attention
SDPA_AVAILABLE = True
except (ImportError, RuntimeError, OSError):
scaled_dot_product_attention = None
SDPA_AVAILABLE = False
Common Errors
| Error Message | Cause | Solution |
|---|---|---|
| `FP16 is not supported on CPU; using FP32 instead` | Running on CPU with default fp16=True | Pass `fp16=False` or use a CUDA GPU |
| `Performing inference on CPU when CUDA is available` | Model loaded on CPU despite GPU being available | Omit the `device` parameter or set `device="cuda"` |
| `Model has been downloaded but the SHA256 checksum does not not match` | Corrupted download | Delete cached file in `~/.cache/whisper` and retry |
| `RuntimeError: Model {name} not found` | Invalid model name | Use one of: tiny, base, small, medium, large-v1, large-v2, large-v3, turbo (with optional .en suffix) |
Compatibility Notes
- CPU inference: FP16 is not supported on CPU. The code automatically falls back to FP32 with a warning.
- CUDA: Any NVIDIA GPU supported by PyTorch works. SDPA (Scaled Dot-Product Attention) is used when available (PyTorch >= 2.0) for faster attention computation.
- macOS: Works on CPU. MPS (Apple Silicon GPU) is not explicitly handled by Whisper — the device parameter must be set manually if desired.
- Windows: Supported. Rust compiler may be needed for tiktoken installation if no pre-built wheel is available.
- PyTorch version: `weights_only=True` is passed to `torch.load()` for PyTorch >= 1.13 for security.
Related Pages
- Implementation:Openai_Whisper_Load_Model
- Implementation:Openai_Whisper_Log_Mel_Spectrogram
- Implementation:Openai_Whisper_Detect_Language
- Implementation:Openai_Whisper_Decode
- Implementation:Openai_Whisper_DecodingTask_Run
- Implementation:Openai_Whisper_Transcribe
- Implementation:Openai_Whisper_Find_Alignment