Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Facebookresearch Audiocraft MusicDataset init

From Leeroopedia

Overview

MusicDataset is the primary dataset class for MusicGen training. It inherits from InfoAudioDataset (which inherits from AudioDataset) and adds music-specific metadata loading, text augmentation, paraphrasing, and joint embedding support. The __init__ method configures augmentation parameters, while __getitem__ loads audio segments alongside structured MusicInfo metadata.

Source Location

Property Value
Source file audiocraft/data/music_dataset.py lines 187-249
Base class source audiocraft/data/audio_dataset.py lines 244-559
Import from audiocraft.data.music_dataset import MusicDataset
Module audiocraft.data.music_dataset

API

Constructor

MusicDataset.__init__(
    *args,
    info_fields_required: bool = True,
    merge_text_p: float = 0.,
    drop_desc_p: float = 0.,
    drop_other_p: float = 0.,
    joint_embed_attributes: List[str] = [],
    paraphrase_source: Optional[str] = None,
    paraphrase_p: float = 0,
    **kwargs
)

Factory Method

AudioDataset.from_meta(
    root: Union[str, Path],
    **kwargs
) -> AudioDataset

Instantiates the dataset from a directory containing a data.jsonl or data.jsonl.gz manifest file.

Item Access

def __getitem__(self, index) -> Tuple[torch.Tensor, MusicInfo]

Returns a tuple of the audio waveform tensor and a MusicInfo dataclass populated from the sidecar JSON.

Parameters

Parameter Type Default Description
info_fields_required bool True Whether required metadata fields must be present in sidecar JSON
merge_text_p float 0.0 Probability of merging structured metadata into the description
drop_desc_p float 0.0 Probability of dropping the original description during merge
drop_other_p float 0.0 Probability of dropping individual metadata fields during merge
joint_embed_attributes List[str] [] Attribute names for which joint embedding conditions are created
paraphrase_source Optional[str] None Path to JSON/JSON.GZ file with paraphrased descriptions
paraphrase_p float 0 Probability of using a paraphrase instead of original description

Inherited Key Parameters (from AudioDataset)

Parameter Type Default Description
segment_duration Optional[float] None Duration of audio segments to sample (typically 30s for MusicGen)
sample_rate int 48000 Target sample rate (32000 for MusicGen base)
channels int 2 Target channels (1 for mono MusicGen)
num_samples int 10000 Number of samples per epoch
sample_on_duration bool True Sample files proportional to duration
sample_on_weight bool True Sample files proportional to weight
min_segment_ratio float 0.5 Minimum ratio of actual audio in a padded segment
shuffle bool True Shuffle data each epoch
pad bool True Pad short segments to target duration

Inputs and Outputs

Inputs:

  • JSONL manifest files with audio metadata (path, duration, sample_rate, and optionally title, artist, description, genre, key, bpm, moods, keywords, instrument, name)
  • Sidecar .json files alongside each audio file containing music-specific metadata
  • Optional paraphrase JSON file

Outputs:

  • Tuple[torch.Tensor, MusicInfo] per sample where:
    • torch.Tensor -- audio waveform of shape [C, T] (channels, time samples)
    • MusicInfo -- dataclass with all metadata fields and a to_condition_attributes() method that converts to ConditioningAttributes for the model

MusicInfo Dataclass

@dataclass
class MusicInfo(AudioInfo):
    title: Optional[str] = None
    artist: Optional[str] = None
    key: Optional[str] = None
    bpm: Optional[float] = None
    genre: Optional[str] = None
    moods: Optional[list] = None
    keywords: Optional[list] = None
    description: Optional[str] = None
    name: Optional[str] = None
    instrument: Optional[str] = None
    self_wav: Optional[WavCondition] = None
    joint_embed: Dict[str, JointEmbedCondition] = field(default_factory=dict)

Internal Flow

The __getitem__ method performs these steps:

  1. Calls parent InfoAudioDataset.__getitem__ to load audio segment and basic info
  2. Loads sidecar .json music metadata via MusicInfo.from_dict()
  3. Optionally applies paraphrasing to the description
  4. Applies text augmentation via augment_music_info_description() if merge_text_p > 0
  5. Attaches self_wav as a WavCondition (the audio itself, used for melody/style conditioning)
  6. Creates JointEmbedCondition entries for any requested joint embed attributes

Dependencies

  • torch -- tensor operations, padding
  • audiocraft.data.info_audio_dataset.InfoAudioDataset -- parent class
  • audiocraft.modules.conditioners -- ConditioningAttributes, WavCondition, JointEmbedCondition

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment