Environment:Speechbrain Speechbrain HuggingFace Transformers

Knowledge Sources	SpeechBrain HuggingFace Transformers
Domains	NLP, Speech_Recognition
Last Updated	2026-02-09 20:00 GMT

Overview

HuggingFace Transformers >= 4.30.0 environment for loading pretrained models (wav2vec2, Whisper, LLaMA2) within SpeechBrain recipes.

Description

SpeechBrain integrates with HuggingFace Transformers to load and fine-tune pretrained models. The integration is provided through wrapper classes in `speechbrain.lobes.models.huggingface_transformers`. Each model type (wav2vec2, Whisper, HuBERT, WavLM, LLaMA2, LaBSE) has a dedicated wrapper. Optional dependencies like `peft` enable parameter-efficient fine-tuning (LoRA), and `bitsandbytes` enables quantized inference.

Usage

Required for any recipe that uses pretrained HuggingFace models: wav2vec2-based ASR (CTC, seq2seq, transducer), Whisper fine-tuning, speaker embedding extraction with pretrained encoders, and LLM-based language modeling.

System Requirements

Category	Requirement	Notes
Hardware	GPU with >= 8GB VRAM for inference	Fine-tuning large models (Whisper large, LLaMA2) may need 24GB+
Disk	5-50GB per model	Models cached in `~/.cache/huggingface/`

Dependencies

Python Packages

`transformers` >= 4.30.0
`huggingface_hub` >= 0.8.0
`peft` (optional; for LoRA/QLoRA fine-tuning)
`bitsandbytes` (optional; for 4-bit/8-bit quantization)
`sacrebleu` (optional; for BLEU evaluation)
`datasets` (optional; for HuggingFace datasets integration)

Credentials

`HF_TOKEN`: HuggingFace API token for accessing gated models (e.g., LLaMA2, some wav2vec2 variants)

Quick Install

# Core HuggingFace integration
pip install transformers>=4.30.0 huggingface_hub>=0.8.0

# Optional: Parameter-efficient fine-tuning
pip install peft bitsandbytes

# Optional: Evaluation
pip install sacrebleu datasets

Code Evidence

Import guard from `speechbrain/lobes/models/huggingface_transformers/__init__.py:7-13`:

try:
    import transformers  # noqa: F401
except ImportError:
    MSG = "Please install transformers from HuggingFace.\n"
    MSG += "E.g. run: pip install transformers\n"
    raise ImportError(MSG)

GPU capability check for LLaMA2 quantization from `speechbrain/lobes/models/huggingface_transformers/llama2.py:121-123`:

if compute_dtype == torch.float16 and use_4bit:
    major, _ = torch.cuda.get_device_capability()
    if major >= 8:
        ...  # recommends bfloat16 for Ampere+

TOKENIZERS_PARALLELISM suppression from `speechbrain/lobes/models/huggingface_transformers/labse.py:23`:

os.environ["TOKENIZERS_PARALLELISM"] = "false"

Common Errors

Error Message	Cause	Solution
`ImportError: Please install transformers`	transformers not installed	`pip install transformers>=4.30.0`
`ImportError: peft`	peft not installed for LoRA	`pip install peft`
`401 Client Error: Unauthorized`	Missing or invalid HF_TOKEN for gated model	Set `HF_TOKEN` env var with valid token
`CUDA out of memory` loading large model	Insufficient VRAM	Use 4-bit quantization with bitsandbytes

Compatibility Notes

LLaMA2 4-bit: Requires GPU compute capability >= 8.0 (Ampere) for optimal bfloat16 performance.
TOKENIZERS_PARALLELISM: Auto-set to "false" by LaBSE wrapper to suppress HuggingFace warnings.
peft: Optional but recommended for fine-tuning large models to reduce memory footprint.

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment