Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:Facebookresearch Audiocraft Audiocraft Installation

From Leeroopedia

Summary

Audiocraft installation involves setting up the Python package and all its dependencies for MusicGen inference. The library can be installed either from source using an editable install (pip install -e .) or from PyPI (pip install audiocraft). The installation process relies on setup.py and requirements.txt to resolve a comprehensive set of dependencies spanning deep learning, audio processing, and NLP libraries.

API Signature

# Editable install from source
pip install -e .

# Standard install from PyPI
pip install audiocraft

# With development extras
pip install -e '.[dev]'

# With watermarking support
pip install -e '.[wm]'

Parameters

Install Option Description
pip install -e . Editable install from the repository root. Links the package to the source directory for development.
pip install audiocraft Standard install from PyPI. Installs a fixed release version.
[dev] Development extras: coverage, flake8, mypy, pdoc3, pytest.
[wm] Watermarking extras: audioseal.

Source Location

  • Setup file: setup.py, lines 1-63
  • Requirements: requirements.txt, lines 1-29

Setup Configuration

The setup.py file defines the following package metadata:

Field Value
Package name audiocraft
Description Audio generation research library for PyTorch
URL https://github.com/facebookresearch/audiocraft
Author FAIR Speech & Audio
Python requirement >=3.8.0
License MIT License

Core Dependencies

The following dependencies are specified in requirements.txt and automatically installed:

Deep Learning Framework

Package Version Constraint Purpose
torch ==2.1.0 Core deep learning framework. Must have CUDA support for GPU inference.
torchaudio >=2.0.0,<2.1.2 Audio transforms (spectrogram, resampling). Used by ChromaExtractor.
torchvision ==0.16.0 Vision utilities (dependency of some training code).
torchtext ==0.16.0 Text utilities (dependency of some training code).
xformers <0.0.23 Memory-efficient attention for transformer inference.
einops (any) Flexible tensor rearrangement operations.

Audio Processing

Package Version Constraint Purpose
av ==11.0.0 FFmpeg Python bindings for reading/writing compressed audio formats.
soundfile (any) Reading/writing WAV and FLAC files via libsndfile.
librosa (any) Audio analysis (chroma extraction for melody conditioning).
demucs (any) Source separation for melody stem extraction.
encodec (any) Reference EnCodec implementation.
julius (any) Fast audio resampling.

Text and NLP

Package Version Constraint Purpose
transformers >=4.31.0 HuggingFace library providing T5 encoder for text conditioning and EnCodec models.
sentencepiece (any) Tokenizer required by T5.
spacy ==3.7.6 NLP processing for text conditioning augmentation.
num2words (any) Number-to-word conversion for text augmentation.
protobuf (any) Protocol buffers, required by sentencepiece.

Configuration and Utilities

Package Version Constraint Purpose
hydra-core >=1.1 Configuration management for training and model specification.
hydra_colorlog (any) Colorized logging for Hydra.
omegaconf (any) YAML-based config (implicit dependency via Hydra, also used in loaders).
flashy >=0.0.1 Training utilities and metrics.
huggingface_hub (any) Downloading pretrained models from HuggingFace Hub.
tqdm (any) Progress bars for generation and training.
numpy <2.0.0 Numerical arrays (transitive dependency of many packages).

Evaluation (Training/Evaluation Only)

Package Version Constraint Purpose
torchmetrics (any) Metric computation for evaluation.
pesq (any) Perceptual Evaluation of Speech Quality metric.
pystoi (any) Short-Time Objective Intelligibility metric.
torchdiffeq (any) ODE solvers for flow matching models.
gradio (any) Web UI for demos.

System Requirements

The following system-level dependencies must be available before installing Audiocraft:

Dependency Required For Installation
CUDA Toolkit GPU-accelerated inference with PyTorch Install from NVIDIA (must match PyTorch CUDA version)
FFmpeg Audio encoding/decoding in audio_write and _av_read apt install ffmpeg (Ubuntu) or conda install ffmpeg
libsndfile Required by soundfile for WAV/FLAC I/O apt install libsndfile1 (Ubuntu)

Installation Procedure

# 1. Create a virtual environment (recommended)
python -m venv audiocraft_env
source audiocraft_env/bin/activate

# 2. Install PyTorch with CUDA support (check https://pytorch.org for your CUDA version)
pip install torch==2.1.0 torchaudio==2.1.0 --index-url https://download.pytorch.org/whl/cu118

# 3. Clone the repository
git clone https://github.com/facebookresearch/audiocraft.git
cd audiocraft

# 4. Install audiocraft with all dependencies
pip install -e .

# 5. Verify installation
python -c "from audiocraft.models import MusicGen; print('Audiocraft installed successfully')"

Minimal Inference Example

from audiocraft.models import MusicGen
from audiocraft.data.audio import audio_write

# Load model (downloads from HuggingFace Hub on first use)
model = MusicGen.get_pretrained('facebook/musicgen-small')
model.set_generation_params(duration=8.0)

# Generate
wav = model.generate(['happy rock song with electric guitar'])

# Save
audio_write('output', wav[0].cpu(), model.sample_rate)

Environment Variables

Variable Default Purpose
AUDIOCRAFT_CACHE_DIR None (uses HuggingFace default cache) Override the directory where downloaded model checkpoints are cached.

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment