Summary
Audiocraft installation involves setting up the Python package and all its dependencies for MusicGen inference. The library can be installed either from source using an editable install (pip install -e .) or from PyPI (pip install audiocraft). The installation process relies on setup.py and requirements.txt to resolve a comprehensive set of dependencies spanning deep learning, audio processing, and NLP libraries.
API Signature
# Editable install from source
pip install -e .
# Standard install from PyPI
pip install audiocraft
# With development extras
pip install -e '.[dev]'
# With watermarking support
pip install -e '.[wm]'
Parameters
| Install Option |
Description
|
pip install -e . |
Editable install from the repository root. Links the package to the source directory for development.
|
pip install audiocraft |
Standard install from PyPI. Installs a fixed release version.
|
[dev] |
Development extras: coverage, flake8, mypy, pdoc3, pytest.
|
[wm] |
Watermarking extras: audioseal.
|
Source Location
- Setup file:
setup.py, lines 1-63
- Requirements:
requirements.txt, lines 1-29
Setup Configuration
The setup.py file defines the following package metadata:
Core Dependencies
The following dependencies are specified in requirements.txt and automatically installed:
Deep Learning Framework
| Package |
Version Constraint |
Purpose
|
torch |
==2.1.0 |
Core deep learning framework. Must have CUDA support for GPU inference.
|
torchaudio |
>=2.0.0,<2.1.2 |
Audio transforms (spectrogram, resampling). Used by ChromaExtractor.
|
torchvision |
==0.16.0 |
Vision utilities (dependency of some training code).
|
torchtext |
==0.16.0 |
Text utilities (dependency of some training code).
|
xformers |
<0.0.23 |
Memory-efficient attention for transformer inference.
|
einops |
(any) |
Flexible tensor rearrangement operations.
|
Audio Processing
| Package |
Version Constraint |
Purpose
|
av |
==11.0.0 |
FFmpeg Python bindings for reading/writing compressed audio formats.
|
soundfile |
(any) |
Reading/writing WAV and FLAC files via libsndfile.
|
librosa |
(any) |
Audio analysis (chroma extraction for melody conditioning).
|
demucs |
(any) |
Source separation for melody stem extraction.
|
encodec |
(any) |
Reference EnCodec implementation.
|
julius |
(any) |
Fast audio resampling.
|
Text and NLP
| Package |
Version Constraint |
Purpose
|
transformers |
>=4.31.0 |
HuggingFace library providing T5 encoder for text conditioning and EnCodec models.
|
sentencepiece |
(any) |
Tokenizer required by T5.
|
spacy |
==3.7.6 |
NLP processing for text conditioning augmentation.
|
num2words |
(any) |
Number-to-word conversion for text augmentation.
|
protobuf |
(any) |
Protocol buffers, required by sentencepiece.
|
Configuration and Utilities
| Package |
Version Constraint |
Purpose
|
hydra-core |
>=1.1 |
Configuration management for training and model specification.
|
hydra_colorlog |
(any) |
Colorized logging for Hydra.
|
omegaconf |
(any) |
YAML-based config (implicit dependency via Hydra, also used in loaders).
|
flashy |
>=0.0.1 |
Training utilities and metrics.
|
huggingface_hub |
(any) |
Downloading pretrained models from HuggingFace Hub.
|
tqdm |
(any) |
Progress bars for generation and training.
|
numpy |
<2.0.0 |
Numerical arrays (transitive dependency of many packages).
|
Evaluation (Training/Evaluation Only)
| Package |
Version Constraint |
Purpose
|
torchmetrics |
(any) |
Metric computation for evaluation.
|
pesq |
(any) |
Perceptual Evaluation of Speech Quality metric.
|
pystoi |
(any) |
Short-Time Objective Intelligibility metric.
|
torchdiffeq |
(any) |
ODE solvers for flow matching models.
|
gradio |
(any) |
Web UI for demos.
|
System Requirements
The following system-level dependencies must be available before installing Audiocraft:
| Dependency |
Required For |
Installation
|
| CUDA Toolkit |
GPU-accelerated inference with PyTorch |
Install from NVIDIA (must match PyTorch CUDA version)
|
| FFmpeg |
Audio encoding/decoding in audio_write and _av_read |
apt install ffmpeg (Ubuntu) or conda install ffmpeg
|
| libsndfile |
Required by soundfile for WAV/FLAC I/O |
apt install libsndfile1 (Ubuntu)
|
Installation Procedure
# 1. Create a virtual environment (recommended)
python -m venv audiocraft_env
source audiocraft_env/bin/activate
# 2. Install PyTorch with CUDA support (check https://pytorch.org for your CUDA version)
pip install torch==2.1.0 torchaudio==2.1.0 --index-url https://download.pytorch.org/whl/cu118
# 3. Clone the repository
git clone https://github.com/facebookresearch/audiocraft.git
cd audiocraft
# 4. Install audiocraft with all dependencies
pip install -e .
# 5. Verify installation
python -c "from audiocraft.models import MusicGen; print('Audiocraft installed successfully')"
Minimal Inference Example
from audiocraft.models import MusicGen
from audiocraft.data.audio import audio_write
# Load model (downloads from HuggingFace Hub on first use)
model = MusicGen.get_pretrained('facebook/musicgen-small')
model.set_generation_params(duration=8.0)
# Generate
wav = model.generate(['happy rock song with electric guitar'])
# Save
audio_write('output', wav[0].cpu(), model.sample_rate)
Environment Variables
| Variable |
Default |
Purpose
|
AUDIOCRAFT_CACHE_DIR |
None (uses HuggingFace default cache) |
Override the directory where downloaded model checkpoints are cached.
|
Related Pages
Page Connections
Double-click a node to navigate. Hold to expand connections.