Environment:Speechbrain Speechbrain PyTorch CUDA Runtime
| Knowledge Sources | |
|---|---|
| Domains | Infrastructure, Deep_Learning |
| Last Updated | 2026-02-09 20:00 GMT |
Overview
Linux-based environment with Python >= 3.8, PyTorch >= 1.9 (recommended >= 2.1), torchaudio, and optional CUDA/ROCm GPU acceleration for SpeechBrain v1.0.3.
Description
This environment provides the core runtime context for all SpeechBrain experiments. It supports both CPU-only and GPU-accelerated execution. GPU acceleration is available via NVIDIA CUDA and AMD ROCm (HIP). The framework automatically detects GPU availability, applies platform-specific quirks (TF32 on Ampere+, disabled cuDNN benchmarking on AMD HIP), and supports mixed precision training with fp16 and bf16.
Usage
This environment is required by all SpeechBrain implementations. Every training script, inference pipeline, and data preparation step depends on PyTorch and torchaudio. GPU acceleration is strongly recommended for model training but not mandatory for data preparation or inference on small inputs.
System Requirements
| Category | Requirement | Notes |
|---|---|---|
| OS | Linux (Ubuntu 20.04+ recommended) | macOS supported; Windows has known issues (GitHub #512) |
| Python | >= 3.8 (3.9+ recommended) | Code formatted targeting Python 3.8 |
| Hardware | CPU (minimum) or NVIDIA/AMD GPU | GPU required for most training recipes |
| VRAM | 8GB+ recommended | Varies by model; wav2vec2/Whisper need 16GB+ |
| Disk | 10GB+ for framework and models | Datasets require additional storage (50-500GB) |
Dependencies
System Packages
- `ffmpeg` (recommended torchaudio backend for audio I/O)
- `sox` (alternative torchaudio backend)
- `git-lfs` (for HuggingFace model downloads)
Python Packages
- `torch` >= 1.9 (pip install); >= 2.1.0 (recommended for development)
- `torchaudio` >= 2.1.0 (recommended; older versions use legacy backend mechanism)
- `hyperpyyaml` >= 0.0.1
- `numpy` >= 1.17.0
- `scipy` >= 1.4.1
- `sentencepiece` >= 0.1.91
- `huggingface_hub` >= 0.8.0
- `tqdm` >= 4.42.0
- `joblib` >= 0.14.1
- `packaging`
Credentials
No credentials required for the core runtime. Individual recipes may require:
- `HF_TOKEN`: HuggingFace API token for gated model downloads (e.g., wav2vec2, Whisper)
Quick Install
# Install SpeechBrain with core dependencies
pip install speechbrain
# Or for development with stricter versions
pip install torch>=2.1.0 torchaudio>=2.1.0
pip install hyperpyyaml joblib numpy scipy sentencepiece huggingface_hub tqdm packaging
Code Evidence
Python version check from `speechbrain/core.py:645-659`:
PYTHON_VERSION_MAJOR = 3
PYTHON_VERSION_MINOR = 8
if not (
sys.version_info.major == PYTHON_VERSION_MAJOR
and sys.version_info.minor >= PYTHON_VERSION_MINOR
):
logger.warning(
"Detected Python %s. We suggest using SpeechBrain with Python >= 3.8",
sys.version_info,
)
GPU detection and ROCm/CUDA logging from `speechbrain/utils/logger.py:304-310`:
if torch.cuda.is_available():
if torch.version.cuda is None:
cuda_str = "ROCm version:\n" + torch.version.hip
else:
cuda_str = "CUDA version:\n" + torch.version.cuda
else:
cuda_str = "CUDA not available"
AMD HIP quirk auto-applied from `speechbrain/utils/quirks.py:92-93`:
if torch.cuda.is_available() and torch.version.hip:
applied_quirks.add("disable_cudnn_benchmarking")
PyTorch version compatibility for GradScaler from `speechbrain/core.py:762-767`:
if version.parse(torch.__version__) < version.parse("2.4.0"):
self.scaler = torch.cuda.amp.GradScaler(enabled=gradscaler_enabled)
else:
self.scaler = torch.GradScaler(self.device, enabled=gradscaler_enabled)
Common Errors
| Error Message | Cause | Solution |
|---|---|---|
| `fp16 is not yet supported on CPU` | Using `--precision=fp16` without GPU | Use `--precision=bf16` for CPU or switch to GPU |
| `Not enough GPUs available!` | DDP launched with more processes than GPUs | Match `--nproc_per_node` to GPU count |
| `torchaudio could not find any working backend` | No audio backend installed | Install `ffmpeg` or `soundfile` system package |
| `This version of torchaudio is old` | torchaudio < 2.1.0 | Upgrade: `pip install torchaudio>=2.1.0` |
| `compile_module_keys specified but PyTorch too old` | torch.compile requires PyTorch >= 2.0 | Upgrade PyTorch or remove compile_module_keys |
Compatibility Notes
- AMD ROCm (HIP): Fully supported. cuDNN benchmarking is automatically disabled on HIP devices to avoid performance issues with dynamic shapes.
- TF32: Auto-enabled as a global quirk on Ampere+ GPUs. Disable via `SB_DISABLE_QUIRKS=allow_tf32`.
- Mixed Precision: fp16 requires CUDA GPU. bf16 works on both CPU and GPU.
- Windows: Known issues (GitHub #512). Use WSL2 recommended.
- macOS: Supported but GPU acceleration limited to Apple Silicon (MPS) if available.
Related Pages
- Implementation:Speechbrain_Speechbrain_Brain_Init
- Implementation:Speechbrain_Speechbrain_Brain_Fit_CTC
- Implementation:Speechbrain_Speechbrain_Separation_Fit_Batch
- Implementation:Speechbrain_Speechbrain_SpeakerBrain_Compute_Forward
- Implementation:Speechbrain_Speechbrain_MetricGanBrain_Fit_Batch
- Implementation:Speechbrain_Speechbrain_SEBrain_Compute_Forward
- Implementation:Speechbrain_Speechbrain_Tacotron2Brain_Compute_Forward
- Implementation:Speechbrain_Speechbrain_HifiGanBrain_Fit_Batch
- Implementation:Speechbrain_Speechbrain_Whisper_ASR_Compute_Forward