Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:Facebookresearch Audiocraft Audio write

From Leeroopedia

Summary

audio_write is a utility function that saves audio tensors to disk in various formats (WAV, MP3, OGG, FLAC) with configurable normalization strategies. It normalizes the audio according to the specified strategy, then pipes the raw PCM data to FFmpeg for encoding into the target format. The function returns the path to the saved audio file.

API Signature

def audio_write(
    stem_name: Union[str, Path],
    wav: torch.Tensor,
    sample_rate: int,
    format: str = 'wav',
    mp3_rate: int = 320,
    ogg_rate: Optional[int] = None,
    normalize: bool = True,
    strategy: str = 'peak',
    peak_clip_headroom_db: float = 1,
    rms_headroom_db: float = 18,
    loudness_headroom_db: float = 14,
    loudness_compressor: bool = False,
    log_clipping: bool = True,
    make_parent_dir: bool = True,
    add_suffix: bool = True,
) -> Path

Parameters

Parameter Type Default Description
stem_name Union[str, Path] (required) Base filename without extension. The appropriate extension is appended automatically based on format.
wav torch.Tensor (required) Audio data tensor. Expected shape [C, T] where C is channels and T is samples. Also accepts 1D tensors (mono), which are automatically unsqueezed.
sample_rate int (required) Sample rate of the audio data in Hz (e.g., 32000).
format str 'wav' Output format. Supported values: 'wav', 'mp3', 'ogg', 'flac'.
mp3_rate int 320 Bitrate in kbps for MP3 encoding.
ogg_rate Optional[int] None Bitrate in kbps for OGG/Vorbis encoding. If None, FFmpeg chooses automatically.
normalize bool True If True, actively normalizes the audio to the target level. If False, normalization is only applied if clipping would otherwise occur.
strategy str 'peak' Normalization strategy. Options: 'clip', 'peak', 'rms', 'loudness'.
peak_clip_headroom_db float 1 Headroom in dB below 0 dBFS for peak and clip strategies.
rms_headroom_db float 18 Target RMS headroom in dB for the RMS normalization strategy.
loudness_headroom_db float 14 Target loudness in dB for the loudness normalization strategy.
loudness_compressor bool False If True, uses tanh-based soft clipping for loudness strategy to avoid hard clipping.
log_clipping bool True If True, logs a warning to stderr when clipping occurs despite the normalization strategy.
make_parent_dir bool True If True, creates parent directories if they do not exist.
add_suffix bool True If True, appends the format extension to the stem name.

Return Value

Type Description
Path Path to the saved audio file, including the extension (e.g., Path('output/song.wav')).

Source Location

  • File: audiocraft/data/audio.py, lines 159-231
  • Import: from audiocraft.data.audio import audio_write

Internal Workflow

The function proceeds through four stages:

Stage 1: Input Validation

  • Asserts the input tensor is floating-point.
  • Handles 1D input by unsqueezing to [1, T].
  • Rejects tensors with more than 2 dimensions.
  • Asserts all values are finite (no NaN or Inf).

Stage 2: Normalization

Calls normalize_audio() from audiocraft/data/audio_utils.py with the specified strategy and parameters. This function applies the selected normalization algorithm and returns the normalized tensor.

Stage 3: Format Configuration

Selects the FFmpeg flags based on the output format:

# WAV: uncompressed 16-bit PCM
flags = ['-f', 'wav', '-c:a', 'pcm_s16le']

# MP3: LAME encoder at specified bitrate
flags = ['-f', 'mp3', '-c:a', 'libmp3lame', '-b:a', f'{mp3_rate}k']

# OGG: Vorbis encoder with optional bitrate
flags = ['-f', 'ogg', '-c:a', 'libvorbis']

# FLAC: lossless compression
flags = ['-f', 'flac']

Stage 4: FFmpeg Encoding

Calls _piping_to_ffmpeg() which:

  1. Constructs an FFmpeg command with the appropriate input format flags (f32le, sample rate, channel count).
  2. Converts the tensor to 32-bit float PCM bytes: f32_pcm(wav).t().detach().cpu().numpy().tobytes().
  3. Pipes the raw bytes to FFmpeg's stdin.
  4. FFmpeg encodes and writes the output file.

If encoding fails, any partially written file is cleaned up (deleted) before re-raising the exception.

Example Usage

from audiocraft.models import MusicGen
from audiocraft.data.audio import audio_write

model = MusicGen.get_pretrained('facebook/musicgen-melody')
model.set_generation_params(duration=8.0)

wav = model.generate(['upbeat electronic dance music'])

# Save as WAV with peak normalization
path = audio_write(
    'output/generated_track',
    wav[0].cpu(),
    model.sample_rate,
    format='wav',
    strategy='peak',
)
print(f"Saved to: {path}")
# Output: Saved to: output/generated_track.wav

# Save as high-quality MP3
path = audio_write(
    'output/generated_track',
    wav[0].cpu(),
    model.sample_rate,
    format='mp3',
    mp3_rate=320,
    strategy='loudness',
    loudness_headroom_db=14,
)

Dependencies

  • soundfile - Audio file metadata (used elsewhere in the module; not directly used by audio_write)
  • av - PyAV library (used for reading; writing uses FFmpeg subprocess)
  • FFmpeg (system binary) - Required for all audio encoding. Must be installed and available on the system PATH.
  • torch - Tensor operations and CPU transfer

Error Handling

If the FFmpeg encoding process fails:

  1. The function checks if a partial output file was created.
  2. If so, it deletes the partial file to avoid leaving corrupt files on disk.
  3. The original exception is re-raised.

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment