Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Environment:Openai Whisper FFmpeg

From Leeroopedia
Knowledge Sources
Domains Infrastructure, Audio_Processing
Last Updated 2025-06-25 00:00 GMT

Overview

System-level FFmpeg CLI tool required for decoding audio files into raw PCM waveform data for Whisper's audio loading pipeline.

Description

Whisper's `load_audio()` function relies on the `ffmpeg` command-line tool to decode audio files of any format (MP3, FLAC, WAV, OGG, etc.) into mono 16kHz signed 16-bit little-endian PCM. FFmpeg is invoked as a subprocess and must be available in the system PATH. This is the only system-level binary dependency beyond the Python runtime.

Usage

Use this environment whenever audio files need to be loaded from disk. The `load_audio()` function, which is called at the start of both the `transcribe()` pipeline and the lower-level API, requires FFmpeg. If you are passing pre-computed NumPy arrays or PyTorch tensors directly, FFmpeg is not needed.

System Requirements

Category Requirement Notes
OS Linux, macOS, or Windows FFmpeg is available on all major platforms
Hardware CPU FFmpeg runs on CPU for audio decoding
Binary `ffmpeg` in PATH Must be accessible as a subprocess

Dependencies

System Packages

  • `ffmpeg` (command-line tool)

Installation by Platform

Platform Install Command
Ubuntu/Debian `sudo apt update && sudo apt install ffmpeg`
Arch Linux `sudo pacman -S ffmpeg`
macOS (Homebrew) `brew install ffmpeg`
Windows (Chocolatey) `choco install ffmpeg`
Windows (Scoop) `scoop install ffmpeg`

Credentials

No credentials required.

Quick Install

# Ubuntu/Debian
sudo apt update && sudo apt install ffmpeg

# Verify installation
ffmpeg -version

Code Evidence

FFmpeg subprocess invocation from `whisper/audio.py:42-61`:

# This launches a subprocess to decode audio while down-mixing
# and resampling as necessary.  Requires the ffmpeg CLI in PATH.
cmd = [
    "ffmpeg",
    "-nostdin",
    "-threads", "0",
    "-i", file,
    "-f", "s16le",
    "-ac", "1",
    "-acodec", "pcm_s16le",
    "-ar", str(sr),
    "-"
]
try:
    out = run(cmd, capture_output=True, check=True).stdout
except CalledProcessError as e:
    raise RuntimeError(f"Failed to load audio: {e.stderr.decode()}") from e

Common Errors

Error Message Cause Solution
`RuntimeError: Failed to load audio: ...` FFmpeg not installed or not in PATH Install ffmpeg and ensure it is on the system PATH
`FileNotFoundError: [Errno 2] No such file or directory: 'ffmpeg'` FFmpeg binary not found Install ffmpeg for your platform (see installation table above)
FFmpeg decode errors in stderr Corrupted or unsupported audio file Verify the audio file is valid; try converting with another tool first

Compatibility Notes

  • All platforms: FFmpeg must be installed separately from the Python package. `pip install openai-whisper` does not install FFmpeg.
  • Docker: When building Docker images, add `RUN apt-get update && apt-get install -y ffmpeg` to the Dockerfile.
  • Bypass: If audio is already loaded as a NumPy float32 array at 16kHz, you can pass it directly to `log_mel_spectrogram()` or `transcribe()`, bypassing FFmpeg entirely.

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment