Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Environment:Speechbrain Speechbrain PyTorch CUDA Runtime

From Leeroopedia


Knowledge Sources
Domains Infrastructure, Deep_Learning
Last Updated 2026-02-09 20:00 GMT

Overview

Linux-based environment with Python >= 3.8, PyTorch >= 1.9 (recommended >= 2.1), torchaudio, and optional CUDA/ROCm GPU acceleration for SpeechBrain v1.0.3.

Description

This environment provides the core runtime context for all SpeechBrain experiments. It supports both CPU-only and GPU-accelerated execution. GPU acceleration is available via NVIDIA CUDA and AMD ROCm (HIP). The framework automatically detects GPU availability, applies platform-specific quirks (TF32 on Ampere+, disabled cuDNN benchmarking on AMD HIP), and supports mixed precision training with fp16 and bf16.

Usage

This environment is required by all SpeechBrain implementations. Every training script, inference pipeline, and data preparation step depends on PyTorch and torchaudio. GPU acceleration is strongly recommended for model training but not mandatory for data preparation or inference on small inputs.

System Requirements

Category Requirement Notes
OS Linux (Ubuntu 20.04+ recommended) macOS supported; Windows has known issues (GitHub #512)
Python >= 3.8 (3.9+ recommended) Code formatted targeting Python 3.8
Hardware CPU (minimum) or NVIDIA/AMD GPU GPU required for most training recipes
VRAM 8GB+ recommended Varies by model; wav2vec2/Whisper need 16GB+
Disk 10GB+ for framework and models Datasets require additional storage (50-500GB)

Dependencies

System Packages

  • `ffmpeg` (recommended torchaudio backend for audio I/O)
  • `sox` (alternative torchaudio backend)
  • `git-lfs` (for HuggingFace model downloads)

Python Packages

  • `torch` >= 1.9 (pip install); >= 2.1.0 (recommended for development)
  • `torchaudio` >= 2.1.0 (recommended; older versions use legacy backend mechanism)
  • `hyperpyyaml` >= 0.0.1
  • `numpy` >= 1.17.0
  • `scipy` >= 1.4.1
  • `sentencepiece` >= 0.1.91
  • `huggingface_hub` >= 0.8.0
  • `tqdm` >= 4.42.0
  • `joblib` >= 0.14.1
  • `packaging`

Credentials

No credentials required for the core runtime. Individual recipes may require:

  • `HF_TOKEN`: HuggingFace API token for gated model downloads (e.g., wav2vec2, Whisper)

Quick Install

# Install SpeechBrain with core dependencies
pip install speechbrain

# Or for development with stricter versions
pip install torch>=2.1.0 torchaudio>=2.1.0
pip install hyperpyyaml joblib numpy scipy sentencepiece huggingface_hub tqdm packaging

Code Evidence

Python version check from `speechbrain/core.py:645-659`:

PYTHON_VERSION_MAJOR = 3
PYTHON_VERSION_MINOR = 8

if not (
    sys.version_info.major == PYTHON_VERSION_MAJOR
    and sys.version_info.minor >= PYTHON_VERSION_MINOR
):
    logger.warning(
        "Detected Python %s. We suggest using SpeechBrain with Python >= 3.8",
        sys.version_info,
    )

GPU detection and ROCm/CUDA logging from `speechbrain/utils/logger.py:304-310`:

if torch.cuda.is_available():
    if torch.version.cuda is None:
        cuda_str = "ROCm version:\n" + torch.version.hip
    else:
        cuda_str = "CUDA version:\n" + torch.version.cuda
else:
    cuda_str = "CUDA not available"

AMD HIP quirk auto-applied from `speechbrain/utils/quirks.py:92-93`:

if torch.cuda.is_available() and torch.version.hip:
    applied_quirks.add("disable_cudnn_benchmarking")

PyTorch version compatibility for GradScaler from `speechbrain/core.py:762-767`:

if version.parse(torch.__version__) < version.parse("2.4.0"):
    self.scaler = torch.cuda.amp.GradScaler(enabled=gradscaler_enabled)
else:
    self.scaler = torch.GradScaler(self.device, enabled=gradscaler_enabled)

Common Errors

Error Message Cause Solution
`fp16 is not yet supported on CPU` Using `--precision=fp16` without GPU Use `--precision=bf16` for CPU or switch to GPU
`Not enough GPUs available!` DDP launched with more processes than GPUs Match `--nproc_per_node` to GPU count
`torchaudio could not find any working backend` No audio backend installed Install `ffmpeg` or `soundfile` system package
`This version of torchaudio is old` torchaudio < 2.1.0 Upgrade: `pip install torchaudio>=2.1.0`
`compile_module_keys specified but PyTorch too old` torch.compile requires PyTorch >= 2.0 Upgrade PyTorch or remove compile_module_keys

Compatibility Notes

  • AMD ROCm (HIP): Fully supported. cuDNN benchmarking is automatically disabled on HIP devices to avoid performance issues with dynamic shapes.
  • TF32: Auto-enabled as a global quirk on Ampere+ GPUs. Disable via `SB_DISABLE_QUIRKS=allow_tf32`.
  • Mixed Precision: fp16 requires CUDA GPU. bf16 works on both CPU and GPU.
  • Windows: Known issues (GitHub #512). Use WSL2 recommended.
  • macOS: Supported but GPU acceleration limited to Apple Silicon (MPS) if available.

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment