Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Workflow:Facebookresearch Audiocraft Model Export And Deployment

From Leeroopedia
Knowledge Sources
Domains Model_Deployment, Checkpoint_Management, Audio_Generation
Last Updated 2026-02-13 23:00 GMT

Overview

End-to-end process for exporting trained AudioCraft models (MusicGen, AudioGen, EnCodec) from training checkpoints to lightweight release-ready formats and loading them for inference.

Description

This workflow covers the complete model export pipeline for AudioCraft. After training a model using Dora, the training checkpoint contains optimizer state, EMA weights, FSDP sharding information, and other training artifacts that are unnecessary for inference. This workflow exports only the essential model weights and configuration, bundles the language model with its companion EnCodec tokenizer, and demonstrates how to load the exported model using the high-level generation API. It also covers fine-tuning checkpoint preparation, including the special case of converting mono models for stereo fine-tuning.

Usage

Execute this workflow after completing model training when you need to create a distributable, inference-ready model package. Also use this workflow when preparing a pretrained model for fine-tuning with modified architecture (e.g., mono to stereo conversion).

Execution Steps

Step 1: Locate Training Checkpoint

Identify the training checkpoint to export using the Dora experiment signature. Each experiment has a unique signature (hash) that maps to a specific folder containing checkpoints. Use the Dora API to resolve the signature to a filesystem path.

Key considerations:

  • Use train.main.get_xp_from_sig('SIG') to get the experiment object
  • The checkpoint file is at xp.folder / 'checkpoint.th'
  • For FSDP-trained models, the best state may be in fsdp_best_state instead of best_state
  • Verify the training completed successfully before exporting

Step 2: Export Language Model

Export the language model (MusicGen, AudioGen, MAGNeT, or JASCO) checkpoint to a lightweight format. The export function extracts only the best model state dictionary and the Hydra configuration, discarding optimizer state, training history, and FSDP metadata.

What gets exported:

  • best_state: the model weights achieving the best validation metric
  • xp.cfg: the Hydra configuration serialized as YAML
  • version: AudioCraft version for compatibility tracking
  • exported: flag marking this as an export checkpoint

Key considerations:

  • FSDP models store best state differently (fsdp_best_state.model)
  • The exported file is significantly smaller than the training checkpoint
  • Output is a standard torch.save dictionary

Step 3: Export Compression Model

Bundle the EnCodec compression model that was used during training. This step depends on whether you trained your own EnCodec or used a pretrained one.

Two cases:

  • Custom EnCodec: export the trained EnCodec checkpoint using export_encodec()
  • Pretrained EnCodec: create a reference pointer using export_pretrained_compression_model()

Key considerations:

  • When using a pretrained model, only a reference string is stored (not the actual weights)
  • The reference will trigger automatic download from HuggingFace at load time
  • Both files must be in the same directory for the loader to find them

Step 4: Organize Export Directory

Structure the exported files in a directory that the AudioCraft loader expects. The directory must contain both the language model state dict and the compression model state dict with specific filenames.

Required directory structure:

  • state_dict.bin - the exported language model weights
  • compression_state_dict.bin - the exported or referenced EnCodec weights
  • Both files must be in the same parent directory

Step 5: Validate Exported Model

Load the exported model using the high-level generation API and verify it produces valid output. This confirms that the export process preserved the model correctly and that inference works end-to-end.

Validation approach:

  • Load via MusicGen.get_pretrained('/path/to/export/dir/')
  • Run a short generation with a test description
  • Compare output quality to pre-export generation
  • Verify sample rate and duration match expectations

Step 6: Prepare Fine_tuning Checkpoints (Optional)

For special fine-tuning scenarios such as converting a mono model to stereo, manually modify the checkpoint structure. This involves duplicating embedding and linear layer weights to accommodate the doubled codebook count (left + right channels interleaved).

Mono to stereo conversion:

  • Load the exported state dict
  • Duplicate embedding and linear weights for paired codebooks
  • Save with the training checkpoint format: dict with best_state.model key
  • Use as continue_from target without the //pretrained/ prefix

Execution Diagram

GitHub URL

Workflow Repository