Implementation:Facebookresearch Audiocraft Audiocraft Installation

Summary

Audiocraft installation involves setting up the Python package and all its dependencies for MusicGen inference. The library can be installed either from source using an editable install (pip install -e .) or from PyPI (pip install audiocraft). The installation process relies on setup.py and requirements.txt to resolve a comprehensive set of dependencies spanning deep learning, audio processing, and NLP libraries.

API Signature

# Editable install from source
pip install -e .

# Standard install from PyPI
pip install audiocraft

# With development extras
pip install -e '.[dev]'

# With watermarking support
pip install -e '.[wm]'

Parameters

Install Option	Description
`pip install -e .`	Editable install from the repository root. Links the package to the source directory for development.
`pip install audiocraft`	Standard install from PyPI. Installs a fixed release version.
`[dev]`	Development extras: `coverage`, `flake8`, `mypy`, `pdoc3`, `pytest`.
`[wm]`	Watermarking extras: `audioseal`.

Source Location

Setup file: setup.py, lines 1-63
Requirements: requirements.txt, lines 1-29

Setup Configuration

The setup.py file defines the following package metadata:

Field	Value
Package name	`audiocraft`
Description	Audio generation research library for PyTorch
URL	https://github.com/facebookresearch/audiocraft
Author	FAIR Speech & Audio
Python requirement	`>=3.8.0`
License	MIT License

Core Dependencies

The following dependencies are specified in requirements.txt and automatically installed:

Deep Learning Framework

Package	Version Constraint	Purpose
`torch`	`==2.1.0`	Core deep learning framework. Must have CUDA support for GPU inference.
`torchaudio`	`>=2.0.0,<2.1.2`	Audio transforms (spectrogram, resampling). Used by `ChromaExtractor`.
`torchvision`	`==0.16.0`	Vision utilities (dependency of some training code).
`torchtext`	`==0.16.0`	Text utilities (dependency of some training code).
`xformers`	`<0.0.23`	Memory-efficient attention for transformer inference.
`einops`	(any)	Flexible tensor rearrangement operations.

Audio Processing

Package	Version Constraint	Purpose
`av`	`==11.0.0`	FFmpeg Python bindings for reading/writing compressed audio formats.
`soundfile`	(any)	Reading/writing WAV and FLAC files via libsndfile.
`librosa`	(any)	Audio analysis (chroma extraction for melody conditioning).
`demucs`	(any)	Source separation for melody stem extraction.
`encodec`	(any)	Reference EnCodec implementation.
`julius`	(any)	Fast audio resampling.

Text and NLP

Package	Version Constraint	Purpose
`transformers`	`>=4.31.0`	HuggingFace library providing T5 encoder for text conditioning and EnCodec models.
`sentencepiece`	(any)	Tokenizer required by T5.
`spacy`	`==3.7.6`	NLP processing for text conditioning augmentation.
`num2words`	(any)	Number-to-word conversion for text augmentation.
`protobuf`	(any)	Protocol buffers, required by sentencepiece.

Configuration and Utilities

Package	Version Constraint	Purpose
`hydra-core`	`>=1.1`	Configuration management for training and model specification.
`hydra_colorlog`	(any)	Colorized logging for Hydra.
`omegaconf`	(any)	YAML-based config (implicit dependency via Hydra, also used in loaders).
`flashy`	`>=0.0.1`	Training utilities and metrics.
`huggingface_hub`	(any)	Downloading pretrained models from HuggingFace Hub.
`tqdm`	(any)	Progress bars for generation and training.
`numpy`	`<2.0.0`	Numerical arrays (transitive dependency of many packages).

Evaluation (Training/Evaluation Only)

Package	Version Constraint	Purpose
`torchmetrics`	(any)	Metric computation for evaluation.
`pesq`	(any)	Perceptual Evaluation of Speech Quality metric.
`pystoi`	(any)	Short-Time Objective Intelligibility metric.
`torchdiffeq`	(any)	ODE solvers for flow matching models.
`gradio`	(any)	Web UI for demos.

System Requirements

The following system-level dependencies must be available before installing Audiocraft:

Dependency	Required For	Installation
CUDA Toolkit	GPU-accelerated inference with PyTorch	Install from NVIDIA (must match PyTorch CUDA version)
FFmpeg	Audio encoding/decoding in `audio_write` and `_av_read`	`apt install ffmpeg` (Ubuntu) or `conda install ffmpeg`
libsndfile	Required by `soundfile` for WAV/FLAC I/O	`apt install libsndfile1` (Ubuntu)

Installation Procedure

# 1. Create a virtual environment (recommended)
python -m venv audiocraft_env
source audiocraft_env/bin/activate

# 2. Install PyTorch with CUDA support (check https://pytorch.org for your CUDA version)
pip install torch==2.1.0 torchaudio==2.1.0 --index-url https://download.pytorch.org/whl/cu118

# 3. Clone the repository
git clone https://github.com/facebookresearch/audiocraft.git
cd audiocraft

# 4. Install audiocraft with all dependencies
pip install -e .

# 5. Verify installation
python -c "from audiocraft.models import MusicGen; print('Audiocraft installed successfully')"

Minimal Inference Example

from audiocraft.models import MusicGen
from audiocraft.data.audio import audio_write

# Load model (downloads from HuggingFace Hub on first use)
model = MusicGen.get_pretrained('facebook/musicgen-small')
model.set_generation_params(duration=8.0)

# Generate
wav = model.generate(['happy rock song with electric guitar'])

# Save
audio_write('output', wav[0].cpu(), model.sample_rate)

Environment Variables

Variable	Default	Purpose
`AUDIOCRAFT_CACHE_DIR`	`None` (uses HuggingFace default cache)	Override the directory where downloaded model checkpoints are cached.

Related Pages

Principle:Facebookresearch_Audiocraft_Environment_Setup
Implementation:Facebookresearch_Audiocraft_MusicGen_get_pretrained - First step after installation: loading a pretrained model.
Implementation:Facebookresearch_Audiocraft_Audio_write - Requires FFmpeg to be installed for audio output.
Environment:Facebookresearch_Audiocraft_Python_PyTorch_CUDA_Environment

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment