Environment:Openai Whisper FFmpeg

Knowledge Sources	OpenAI Whisper FFmpeg
Domains	Infrastructure, Audio_Processing
Last Updated	2025-06-25 00:00 GMT

Overview

System-level FFmpeg CLI tool required for decoding audio files into raw PCM waveform data for Whisper's audio loading pipeline.

Description

Whisper's `load_audio()` function relies on the `ffmpeg` command-line tool to decode audio files of any format (MP3, FLAC, WAV, OGG, etc.) into mono 16kHz signed 16-bit little-endian PCM. FFmpeg is invoked as a subprocess and must be available in the system PATH. This is the only system-level binary dependency beyond the Python runtime.

Usage

Use this environment whenever audio files need to be loaded from disk. The `load_audio()` function, which is called at the start of both the `transcribe()` pipeline and the lower-level API, requires FFmpeg. If you are passing pre-computed NumPy arrays or PyTorch tensors directly, FFmpeg is not needed.

System Requirements

Category	Requirement	Notes
OS	Linux, macOS, or Windows	FFmpeg is available on all major platforms
Hardware	CPU	FFmpeg runs on CPU for audio decoding
Binary	`ffmpeg` in PATH	Must be accessible as a subprocess

Dependencies

System Packages

`ffmpeg` (command-line tool)

Installation by Platform

Platform	Install Command
Ubuntu/Debian	`sudo apt update && sudo apt install ffmpeg`
Arch Linux	`sudo pacman -S ffmpeg`
macOS (Homebrew)	`brew install ffmpeg`
Windows (Chocolatey)	`choco install ffmpeg`
Windows (Scoop)	`scoop install ffmpeg`

Credentials

No credentials required.

Quick Install

# Ubuntu/Debian
sudo apt update && sudo apt install ffmpeg

# Verify installation
ffmpeg -version

Code Evidence

FFmpeg subprocess invocation from `whisper/audio.py:42-61`:

# This launches a subprocess to decode audio while down-mixing
# and resampling as necessary.  Requires the ffmpeg CLI in PATH.
cmd = [
    "ffmpeg",
    "-nostdin",
    "-threads", "0",
    "-i", file,
    "-f", "s16le",
    "-ac", "1",
    "-acodec", "pcm_s16le",
    "-ar", str(sr),
    "-"
]
try:
    out = run(cmd, capture_output=True, check=True).stdout
except CalledProcessError as e:
    raise RuntimeError(f"Failed to load audio: {e.stderr.decode()}") from e

Common Errors

Error Message	Cause	Solution
`RuntimeError: Failed to load audio: ...`	FFmpeg not installed or not in PATH	Install ffmpeg and ensure it is on the system PATH
`FileNotFoundError: [Errno 2] No such file or directory: 'ffmpeg'`	FFmpeg binary not found	Install ffmpeg for your platform (see installation table above)
FFmpeg decode errors in stderr	Corrupted or unsupported audio file	Verify the audio file is valid; try converting with another tool first

Compatibility Notes

All platforms: FFmpeg must be installed separately from the Python package. `pip install openai-whisper` does not install FFmpeg.
Docker: When building Docker images, add `RUN apt-get update && apt-get install -y ffmpeg` to the Dockerfile.
Bypass: If audio is already loaded as a NumPy float32 array at 16kHz, you can pass it directly to `log_mel_spectrogram()` or `transcribe()`, bypassing FFmpeg entirely.

Related Pages

Implementation:Openai_Whisper_Load_Audio

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment