Implementation:Facebookresearch Audiocraft Audio write

Summary

audio_write is a utility function that saves audio tensors to disk in various formats (WAV, MP3, OGG, FLAC) with configurable normalization strategies. It normalizes the audio according to the specified strategy, then pipes the raw PCM data to FFmpeg for encoding into the target format. The function returns the path to the saved audio file.

API Signature

def audio_write(
    stem_name: Union[str, Path],
    wav: torch.Tensor,
    sample_rate: int,
    format: str = 'wav',
    mp3_rate: int = 320,
    ogg_rate: Optional[int] = None,
    normalize: bool = True,
    strategy: str = 'peak',
    peak_clip_headroom_db: float = 1,
    rms_headroom_db: float = 18,
    loudness_headroom_db: float = 14,
    loudness_compressor: bool = False,
    log_clipping: bool = True,
    make_parent_dir: bool = True,
    add_suffix: bool = True,
) -> Path

Parameters

Parameter	Type	Default	Description
`stem_name`	`Union[str, Path]`	(required)	Base filename without extension. The appropriate extension is appended automatically based on `format`.
`wav`	`torch.Tensor`	(required)	Audio data tensor. Expected shape `[C, T]` where `C` is channels and `T` is samples. Also accepts 1D tensors (mono), which are automatically unsqueezed.
`sample_rate`	`int`	(required)	Sample rate of the audio data in Hz (e.g., 32000).
`format`	`str`	`'wav'`	Output format. Supported values: `'wav'`, `'mp3'`, `'ogg'`, `'flac'`.
`mp3_rate`	`int`	`320`	Bitrate in kbps for MP3 encoding.
`ogg_rate`	`Optional[int]`	`None`	Bitrate in kbps for OGG/Vorbis encoding. If `None`, FFmpeg chooses automatically.
`normalize`	`bool`	`True`	If `True`, actively normalizes the audio to the target level. If `False`, normalization is only applied if clipping would otherwise occur.
`strategy`	`str`	`'peak'`	Normalization strategy. Options: `'clip'`, `'peak'`, `'rms'`, `'loudness'`.
`peak_clip_headroom_db`	`float`	`1`	Headroom in dB below 0 dBFS for peak and clip strategies.
`rms_headroom_db`	`float`	`18`	Target RMS headroom in dB for the RMS normalization strategy.
`loudness_headroom_db`	`float`	`14`	Target loudness in dB for the loudness normalization strategy.
`loudness_compressor`	`bool`	`False`	If `True`, uses tanh-based soft clipping for loudness strategy to avoid hard clipping.
`log_clipping`	`bool`	`True`	If `True`, logs a warning to stderr when clipping occurs despite the normalization strategy.
`make_parent_dir`	`bool`	`True`	If `True`, creates parent directories if they do not exist.
`add_suffix`	`bool`	`True`	If `True`, appends the format extension to the stem name.

Return Value

Type	Description
`Path`	Path to the saved audio file, including the extension (e.g., `Path('output/song.wav')`).

Source Location

File: audiocraft/data/audio.py, lines 159-231
Import: from audiocraft.data.audio import audio_write

Internal Workflow

The function proceeds through four stages:

Stage 1: Input Validation

Asserts the input tensor is floating-point.
Handles 1D input by unsqueezing to [1, T].
Rejects tensors with more than 2 dimensions.
Asserts all values are finite (no NaN or Inf).

Stage 2: Normalization

Calls normalize_audio() from audiocraft/data/audio_utils.py with the specified strategy and parameters. This function applies the selected normalization algorithm and returns the normalized tensor.

Stage 3: Format Configuration

Selects the FFmpeg flags based on the output format:

# WAV: uncompressed 16-bit PCM
flags = ['-f', 'wav', '-c:a', 'pcm_s16le']

# MP3: LAME encoder at specified bitrate
flags = ['-f', 'mp3', '-c:a', 'libmp3lame', '-b:a', f'{mp3_rate}k']

# OGG: Vorbis encoder with optional bitrate
flags = ['-f', 'ogg', '-c:a', 'libvorbis']

# FLAC: lossless compression
flags = ['-f', 'flac']

Stage 4: FFmpeg Encoding

Calls _piping_to_ffmpeg() which:

Constructs an FFmpeg command with the appropriate input format flags (f32le, sample rate, channel count).
Converts the tensor to 32-bit float PCM bytes: f32_pcm(wav).t().detach().cpu().numpy().tobytes().
Pipes the raw bytes to FFmpeg's stdin.
FFmpeg encodes and writes the output file.

If encoding fails, any partially written file is cleaned up (deleted) before re-raising the exception.

Example Usage

from audiocraft.models import MusicGen
from audiocraft.data.audio import audio_write

model = MusicGen.get_pretrained('facebook/musicgen-melody')
model.set_generation_params(duration=8.0)

wav = model.generate(['upbeat electronic dance music'])

# Save as WAV with peak normalization
path = audio_write(
    'output/generated_track',
    wav[0].cpu(),
    model.sample_rate,
    format='wav',
    strategy='peak',
)
print(f"Saved to: {path}")
# Output: Saved to: output/generated_track.wav

# Save as high-quality MP3
path = audio_write(
    'output/generated_track',
    wav[0].cpu(),
    model.sample_rate,
    format='mp3',
    mp3_rate=320,
    strategy='loudness',
    loudness_headroom_db=14,
)

Dependencies

soundfile - Audio file metadata (used elsewhere in the module; not directly used by audio_write)
av - PyAV library (used for reading; writing uses FFmpeg subprocess)
FFmpeg (system binary) - Required for all audio encoding. Must be installed and available on the system PATH.
torch - Tensor operations and CPU transfer

Error Handling

If the FFmpeg encoding process fails:

The function checks if a partial output file was created.
If so, it deletes the partial file to avoid leaving corrupt files on disk.
The original exception is re-raised.

Related Pages

Principle:Facebookresearch_Audiocraft_Audio_File_Writing
Implementation:Facebookresearch_Audiocraft_EncodecModel_decode - Produces the audio tensor that this function writes to disk.
Implementation:Facebookresearch_Audiocraft_Audiocraft_Installation - FFmpeg and other system dependencies must be installed.
Heuristic:Facebookresearch_Audiocraft_Audio_Normalization_Strategies

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment