Environment:Openai Whisper FFmpeg
| Knowledge Sources | |
|---|---|
| Domains | Infrastructure, Audio_Processing |
| Last Updated | 2025-06-25 00:00 GMT |
Overview
System-level FFmpeg CLI tool required for decoding audio files into raw PCM waveform data for Whisper's audio loading pipeline.
Description
Whisper's `load_audio()` function relies on the `ffmpeg` command-line tool to decode audio files of any format (MP3, FLAC, WAV, OGG, etc.) into mono 16kHz signed 16-bit little-endian PCM. FFmpeg is invoked as a subprocess and must be available in the system PATH. This is the only system-level binary dependency beyond the Python runtime.
Usage
Use this environment whenever audio files need to be loaded from disk. The `load_audio()` function, which is called at the start of both the `transcribe()` pipeline and the lower-level API, requires FFmpeg. If you are passing pre-computed NumPy arrays or PyTorch tensors directly, FFmpeg is not needed.
System Requirements
| Category | Requirement | Notes |
|---|---|---|
| OS | Linux, macOS, or Windows | FFmpeg is available on all major platforms |
| Hardware | CPU | FFmpeg runs on CPU for audio decoding |
| Binary | `ffmpeg` in PATH | Must be accessible as a subprocess |
Dependencies
System Packages
- `ffmpeg` (command-line tool)
Installation by Platform
| Platform | Install Command |
|---|---|
| Ubuntu/Debian | `sudo apt update && sudo apt install ffmpeg` |
| Arch Linux | `sudo pacman -S ffmpeg` |
| macOS (Homebrew) | `brew install ffmpeg` |
| Windows (Chocolatey) | `choco install ffmpeg` |
| Windows (Scoop) | `scoop install ffmpeg` |
Credentials
No credentials required.
Quick Install
# Ubuntu/Debian
sudo apt update && sudo apt install ffmpeg
# Verify installation
ffmpeg -version
Code Evidence
FFmpeg subprocess invocation from `whisper/audio.py:42-61`:
# This launches a subprocess to decode audio while down-mixing
# and resampling as necessary. Requires the ffmpeg CLI in PATH.
cmd = [
"ffmpeg",
"-nostdin",
"-threads", "0",
"-i", file,
"-f", "s16le",
"-ac", "1",
"-acodec", "pcm_s16le",
"-ar", str(sr),
"-"
]
try:
out = run(cmd, capture_output=True, check=True).stdout
except CalledProcessError as e:
raise RuntimeError(f"Failed to load audio: {e.stderr.decode()}") from e
Common Errors
| Error Message | Cause | Solution |
|---|---|---|
| `RuntimeError: Failed to load audio: ...` | FFmpeg not installed or not in PATH | Install ffmpeg and ensure it is on the system PATH |
| `FileNotFoundError: [Errno 2] No such file or directory: 'ffmpeg'` | FFmpeg binary not found | Install ffmpeg for your platform (see installation table above) |
| FFmpeg decode errors in stderr | Corrupted or unsupported audio file | Verify the audio file is valid; try converting with another tool first |
Compatibility Notes
- All platforms: FFmpeg must be installed separately from the Python package. `pip install openai-whisper` does not install FFmpeg.
- Docker: When building Docker images, add `RUN apt-get update && apt-get install -y ffmpeg` to the Dockerfile.
- Bypass: If audio is already loaded as a NumPy float32 array at 16kHz, you can pass it directly to `log_mel_spectrogram()` or `transcribe()`, bypassing FFmpeg entirely.