Principle:Facebookresearch Audiocraft Export Directory Organization
Overview
Export Directory Organization defines the standardized directory layout that AudioCraft expects when loading models for inference. Whether models are stored locally or on HuggingFace Hub, they must follow a specific file naming and structure convention so that MusicGen.get_pretrained() and related loading APIs can locate and assemble all required components automatically. This is a pattern-level principle rather than a single API -- it governs how multiple exported files are arranged to form a complete, loadable model package.
Theoretical Background
AudioCraft's model loading infrastructure (audiocraft/models/loaders.py) uses a filename-based convention to discover model components within a directory or HuggingFace repository. The loader function _get_state_dict() supports multiple resolution strategies -- local files, local directories, HTTPS URLs, and HuggingFace Hub repository IDs -- but all share the same expected filenames.
The two-file convention reflects AudioCraft's architectural separation between the language model (which generates tokens) and the compression model (which maps between tokens and audio). Each component has its own export file with a fixed name, enabling the loader to fetch them independently.
Expected Directory Layout
| File | Purpose | Produced By |
|---|---|---|
state_dict.bin |
Language model weights, configuration, and version metadata | export_lm()
|
compression_state_dict.bin |
Compression model weights or pretrained reference | export_pretrained_compression_model()
|
Alternatively, a single all_in_one.pt file may contain both components, though the two-file layout is the standard convention.
Loading Resolution Order
The _get_state_dict() function in audiocraft/models/loaders.py (lines 40-71) resolves model files using the following priority:
- Local file path: If
file_or_url_or_idis an existing file, load it directly withtorch.load(). - Local directory: If it is an existing directory, append the expected
filename(e.g.,state_dict.bin) and load from that path. - HTTPS URL: If it starts with
https://, usetorch.hub.load_state_dict_from_url(). - HuggingFace Hub ID: Otherwise, treat it as a HuggingFace repository ID and use
hf_hub_download()to fetch the specific filename.
HuggingFace Hub Compatibility
The directory layout is designed to be directly uploadable to HuggingFace Hub as a model repository. The fixed filenames (state_dict.bin, compression_state_dict.bin) serve as the filename parameter in hf_hub_download() calls. This means the same model can be loaded via a local path or a Hub ID with no code changes:
# Loading from local export directory
model = MusicGen.get_pretrained('/local/path/to/my_model')
# Loading from HuggingFace Hub (same directory structure)
model = MusicGen.get_pretrained('facebook/musicgen-small')
Design Rationale
- Separation of concerns: Keeping language model and compression model in separate files allows updating one without re-exporting the other.
- Hub-native layout: The flat directory structure with fixed filenames maps directly to HuggingFace Hub repositories.
- Backward compatibility: The loader supports multiple resolution strategies, ensuring older model distributions (URLs, single files) continue to work.