Environment:Speechbrain Speechbrain PyTorch CUDA Runtime

Knowledge Sources	SpeechBrain SpeechBrain Installation
Domains	Infrastructure, Deep_Learning
Last Updated	2026-02-09 20:00 GMT

Overview

Linux-based environment with Python >= 3.8, PyTorch >= 1.9 (recommended >= 2.1), torchaudio, and optional CUDA/ROCm GPU acceleration for SpeechBrain v1.0.3.

Description

This environment provides the core runtime context for all SpeechBrain experiments. It supports both CPU-only and GPU-accelerated execution. GPU acceleration is available via NVIDIA CUDA and AMD ROCm (HIP). The framework automatically detects GPU availability, applies platform-specific quirks (TF32 on Ampere+, disabled cuDNN benchmarking on AMD HIP), and supports mixed precision training with fp16 and bf16.

Usage

This environment is required by all SpeechBrain implementations. Every training script, inference pipeline, and data preparation step depends on PyTorch and torchaudio. GPU acceleration is strongly recommended for model training but not mandatory for data preparation or inference on small inputs.

System Requirements

Category	Requirement	Notes
OS	Linux (Ubuntu 20.04+ recommended)	macOS supported; Windows has known issues (GitHub #512)
Python	>= 3.8 (3.9+ recommended)	Code formatted targeting Python 3.8
Hardware	CPU (minimum) or NVIDIA/AMD GPU	GPU required for most training recipes
VRAM	8GB+ recommended	Varies by model; wav2vec2/Whisper need 16GB+
Disk	10GB+ for framework and models	Datasets require additional storage (50-500GB)

Dependencies

System Packages

`ffmpeg` (recommended torchaudio backend for audio I/O)
`sox` (alternative torchaudio backend)
`git-lfs` (for HuggingFace model downloads)

Python Packages

`torch` >= 1.9 (pip install); >= 2.1.0 (recommended for development)
`torchaudio` >= 2.1.0 (recommended; older versions use legacy backend mechanism)
`hyperpyyaml` >= 0.0.1
`numpy` >= 1.17.0
`scipy` >= 1.4.1
`sentencepiece` >= 0.1.91
`huggingface_hub` >= 0.8.0
`tqdm` >= 4.42.0
`joblib` >= 0.14.1
`packaging`

Credentials

No credentials required for the core runtime. Individual recipes may require:

`HF_TOKEN`: HuggingFace API token for gated model downloads (e.g., wav2vec2, Whisper)

Quick Install

# Install SpeechBrain with core dependencies
pip install speechbrain

# Or for development with stricter versions
pip install torch>=2.1.0 torchaudio>=2.1.0
pip install hyperpyyaml joblib numpy scipy sentencepiece huggingface_hub tqdm packaging

Code Evidence

Python version check from `speechbrain/core.py:645-659`:

PYTHON_VERSION_MAJOR = 3
PYTHON_VERSION_MINOR = 8

if not (
    sys.version_info.major == PYTHON_VERSION_MAJOR
    and sys.version_info.minor >= PYTHON_VERSION_MINOR
):
    logger.warning(
        "Detected Python %s. We suggest using SpeechBrain with Python >= 3.8",
        sys.version_info,
    )

GPU detection and ROCm/CUDA logging from `speechbrain/utils/logger.py:304-310`:

if torch.cuda.is_available():
    if torch.version.cuda is None:
        cuda_str = "ROCm version:\n" + torch.version.hip
    else:
        cuda_str = "CUDA version:\n" + torch.version.cuda
else:
    cuda_str = "CUDA not available"

AMD HIP quirk auto-applied from `speechbrain/utils/quirks.py:92-93`:

if torch.cuda.is_available() and torch.version.hip:
    applied_quirks.add("disable_cudnn_benchmarking")

PyTorch version compatibility for GradScaler from `speechbrain/core.py:762-767`:

if version.parse(torch.__version__) < version.parse("2.4.0"):
    self.scaler = torch.cuda.amp.GradScaler(enabled=gradscaler_enabled)
else:
    self.scaler = torch.GradScaler(self.device, enabled=gradscaler_enabled)

Common Errors

Error Message	Cause	Solution
`fp16 is not yet supported on CPU`	Using `--precision=fp16` without GPU	Use `--precision=bf16` for CPU or switch to GPU
`Not enough GPUs available!`	DDP launched with more processes than GPUs	Match `--nproc_per_node` to GPU count
`torchaudio could not find any working backend`	No audio backend installed	Install `ffmpeg` or `soundfile` system package
`This version of torchaudio is old`	torchaudio < 2.1.0	Upgrade: `pip install torchaudio>=2.1.0`
`compile_module_keys specified but PyTorch too old`	torch.compile requires PyTorch >= 2.0	Upgrade PyTorch or remove compile_module_keys

Compatibility Notes

AMD ROCm (HIP): Fully supported. cuDNN benchmarking is automatically disabled on HIP devices to avoid performance issues with dynamic shapes.
TF32: Auto-enabled as a global quirk on Ampere+ GPUs. Disable via `SB_DISABLE_QUIRKS=allow_tf32`.
Mixed Precision: fp16 requires CUDA GPU. bf16 works on both CPU and GPU.
Windows: Known issues (GitHub #512). Use WSL2 recommended.
macOS: Supported but GPU acceleration limited to Apple Silicon (MPS) if available.

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment