Environment:Speechbrain Speechbrain HuggingFace Transformers
| Knowledge Sources | |
|---|---|
| Domains | NLP, Speech_Recognition |
| Last Updated | 2026-02-09 20:00 GMT |
Overview
HuggingFace Transformers >= 4.30.0 environment for loading pretrained models (wav2vec2, Whisper, LLaMA2) within SpeechBrain recipes.
Description
SpeechBrain integrates with HuggingFace Transformers to load and fine-tune pretrained models. The integration is provided through wrapper classes in `speechbrain.lobes.models.huggingface_transformers`. Each model type (wav2vec2, Whisper, HuBERT, WavLM, LLaMA2, LaBSE) has a dedicated wrapper. Optional dependencies like `peft` enable parameter-efficient fine-tuning (LoRA), and `bitsandbytes` enables quantized inference.
Usage
Required for any recipe that uses pretrained HuggingFace models: wav2vec2-based ASR (CTC, seq2seq, transducer), Whisper fine-tuning, speaker embedding extraction with pretrained encoders, and LLM-based language modeling.
System Requirements
| Category | Requirement | Notes |
|---|---|---|
| Hardware | GPU with >= 8GB VRAM for inference | Fine-tuning large models (Whisper large, LLaMA2) may need 24GB+ |
| Disk | 5-50GB per model | Models cached in `~/.cache/huggingface/` |
Dependencies
Python Packages
- `transformers` >= 4.30.0
- `huggingface_hub` >= 0.8.0
- `peft` (optional; for LoRA/QLoRA fine-tuning)
- `bitsandbytes` (optional; for 4-bit/8-bit quantization)
- `sacrebleu` (optional; for BLEU evaluation)
- `datasets` (optional; for HuggingFace datasets integration)
Credentials
- `HF_TOKEN`: HuggingFace API token for accessing gated models (e.g., LLaMA2, some wav2vec2 variants)
Quick Install
# Core HuggingFace integration
pip install transformers>=4.30.0 huggingface_hub>=0.8.0
# Optional: Parameter-efficient fine-tuning
pip install peft bitsandbytes
# Optional: Evaluation
pip install sacrebleu datasets
Code Evidence
Import guard from `speechbrain/lobes/models/huggingface_transformers/__init__.py:7-13`:
try:
import transformers # noqa: F401
except ImportError:
MSG = "Please install transformers from HuggingFace.\n"
MSG += "E.g. run: pip install transformers\n"
raise ImportError(MSG)
GPU capability check for LLaMA2 quantization from `speechbrain/lobes/models/huggingface_transformers/llama2.py:121-123`:
if compute_dtype == torch.float16 and use_4bit:
major, _ = torch.cuda.get_device_capability()
if major >= 8:
... # recommends bfloat16 for Ampere+
TOKENIZERS_PARALLELISM suppression from `speechbrain/lobes/models/huggingface_transformers/labse.py:23`:
os.environ["TOKENIZERS_PARALLELISM"] = "false"
Common Errors
| Error Message | Cause | Solution |
|---|---|---|
| `ImportError: Please install transformers` | transformers not installed | `pip install transformers>=4.30.0` |
| `ImportError: peft` | peft not installed for LoRA | `pip install peft` |
| `401 Client Error: Unauthorized` | Missing or invalid HF_TOKEN for gated model | Set `HF_TOKEN` env var with valid token |
| `CUDA out of memory` loading large model | Insufficient VRAM | Use 4-bit quantization with bitsandbytes |
Compatibility Notes
- LLaMA2 4-bit: Requires GPU compute capability >= 8.0 (Ampere) for optimal bfloat16 performance.
- TOKENIZERS_PARALLELISM: Auto-set to "false" by LaBSE wrapper to suppress HuggingFace warnings.
- peft: Optional but recommended for fine-tuning large models to reduce memory footprint.