Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Principle:Facebookresearch Audiocraft Export Directory Organization

From Leeroopedia

Overview

Export Directory Organization defines the standardized directory layout that AudioCraft expects when loading models for inference. Whether models are stored locally or on HuggingFace Hub, they must follow a specific file naming and structure convention so that MusicGen.get_pretrained() and related loading APIs can locate and assemble all required components automatically. This is a pattern-level principle rather than a single API -- it governs how multiple exported files are arranged to form a complete, loadable model package.

Theoretical Background

AudioCraft's model loading infrastructure (audiocraft/models/loaders.py) uses a filename-based convention to discover model components within a directory or HuggingFace repository. The loader function _get_state_dict() supports multiple resolution strategies -- local files, local directories, HTTPS URLs, and HuggingFace Hub repository IDs -- but all share the same expected filenames.

The two-file convention reflects AudioCraft's architectural separation between the language model (which generates tokens) and the compression model (which maps between tokens and audio). Each component has its own export file with a fixed name, enabling the loader to fetch them independently.

Expected Directory Layout

File Purpose Produced By
state_dict.bin Language model weights, configuration, and version metadata export_lm()
compression_state_dict.bin Compression model weights or pretrained reference export_pretrained_compression_model()

Alternatively, a single all_in_one.pt file may contain both components, though the two-file layout is the standard convention.

Loading Resolution Order

The _get_state_dict() function in audiocraft/models/loaders.py (lines 40-71) resolves model files using the following priority:

  • Local file path: If file_or_url_or_id is an existing file, load it directly with torch.load().
  • Local directory: If it is an existing directory, append the expected filename (e.g., state_dict.bin) and load from that path.
  • HTTPS URL: If it starts with https://, use torch.hub.load_state_dict_from_url().
  • HuggingFace Hub ID: Otherwise, treat it as a HuggingFace repository ID and use hf_hub_download() to fetch the specific filename.

HuggingFace Hub Compatibility

The directory layout is designed to be directly uploadable to HuggingFace Hub as a model repository. The fixed filenames (state_dict.bin, compression_state_dict.bin) serve as the filename parameter in hf_hub_download() calls. This means the same model can be loaded via a local path or a Hub ID with no code changes:

# Loading from local export directory
model = MusicGen.get_pretrained('/local/path/to/my_model')

# Loading from HuggingFace Hub (same directory structure)
model = MusicGen.get_pretrained('facebook/musicgen-small')

Design Rationale

  • Separation of concerns: Keeping language model and compression model in separate files allows updating one without re-exporting the other.
  • Hub-native layout: The flat directory structure with fixed filenames maps directly to HuggingFace Hub repositories.
  • Backward compatibility: The loader supports multiple resolution strategies, ensuring older model distributions (URLs, single files) continue to work.

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment