Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Environment:Speechbrain Speechbrain HuggingFace Transformers

From Leeroopedia


Knowledge Sources
Domains NLP, Speech_Recognition
Last Updated 2026-02-09 20:00 GMT

Overview

HuggingFace Transformers >= 4.30.0 environment for loading pretrained models (wav2vec2, Whisper, LLaMA2) within SpeechBrain recipes.

Description

SpeechBrain integrates with HuggingFace Transformers to load and fine-tune pretrained models. The integration is provided through wrapper classes in `speechbrain.lobes.models.huggingface_transformers`. Each model type (wav2vec2, Whisper, HuBERT, WavLM, LLaMA2, LaBSE) has a dedicated wrapper. Optional dependencies like `peft` enable parameter-efficient fine-tuning (LoRA), and `bitsandbytes` enables quantized inference.

Usage

Required for any recipe that uses pretrained HuggingFace models: wav2vec2-based ASR (CTC, seq2seq, transducer), Whisper fine-tuning, speaker embedding extraction with pretrained encoders, and LLM-based language modeling.

System Requirements

Category Requirement Notes
Hardware GPU with >= 8GB VRAM for inference Fine-tuning large models (Whisper large, LLaMA2) may need 24GB+
Disk 5-50GB per model Models cached in `~/.cache/huggingface/`

Dependencies

Python Packages

  • `transformers` >= 4.30.0
  • `huggingface_hub` >= 0.8.0
  • `peft` (optional; for LoRA/QLoRA fine-tuning)
  • `bitsandbytes` (optional; for 4-bit/8-bit quantization)
  • `sacrebleu` (optional; for BLEU evaluation)
  • `datasets` (optional; for HuggingFace datasets integration)

Credentials

  • `HF_TOKEN`: HuggingFace API token for accessing gated models (e.g., LLaMA2, some wav2vec2 variants)

Quick Install

# Core HuggingFace integration
pip install transformers>=4.30.0 huggingface_hub>=0.8.0

# Optional: Parameter-efficient fine-tuning
pip install peft bitsandbytes

# Optional: Evaluation
pip install sacrebleu datasets

Code Evidence

Import guard from `speechbrain/lobes/models/huggingface_transformers/__init__.py:7-13`:

try:
    import transformers  # noqa: F401
except ImportError:
    MSG = "Please install transformers from HuggingFace.\n"
    MSG += "E.g. run: pip install transformers\n"
    raise ImportError(MSG)

GPU capability check for LLaMA2 quantization from `speechbrain/lobes/models/huggingface_transformers/llama2.py:121-123`:

if compute_dtype == torch.float16 and use_4bit:
    major, _ = torch.cuda.get_device_capability()
    if major >= 8:
        ...  # recommends bfloat16 for Ampere+

TOKENIZERS_PARALLELISM suppression from `speechbrain/lobes/models/huggingface_transformers/labse.py:23`:

os.environ["TOKENIZERS_PARALLELISM"] = "false"

Common Errors

Error Message Cause Solution
`ImportError: Please install transformers` transformers not installed `pip install transformers>=4.30.0`
`ImportError: peft` peft not installed for LoRA `pip install peft`
`401 Client Error: Unauthorized` Missing or invalid HF_TOKEN for gated model Set `HF_TOKEN` env var with valid token
`CUDA out of memory` loading large model Insufficient VRAM Use 4-bit quantization with bitsandbytes

Compatibility Notes

  • LLaMA2 4-bit: Requires GPU compute capability >= 8.0 (Ampere) for optimal bfloat16 performance.
  • TOKENIZERS_PARALLELISM: Auto-set to "false" by LaBSE wrapper to suppress HuggingFace warnings.
  • peft: Optional but recommended for fine-tuning large models to reduce memory footprint.

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment