Implementation:Facebookresearch Audiocraft Audio write
Summary
audio_write is a utility function that saves audio tensors to disk in various formats (WAV, MP3, OGG, FLAC) with configurable normalization strategies. It normalizes the audio according to the specified strategy, then pipes the raw PCM data to FFmpeg for encoding into the target format. The function returns the path to the saved audio file.
API Signature
def audio_write(
stem_name: Union[str, Path],
wav: torch.Tensor,
sample_rate: int,
format: str = 'wav',
mp3_rate: int = 320,
ogg_rate: Optional[int] = None,
normalize: bool = True,
strategy: str = 'peak',
peak_clip_headroom_db: float = 1,
rms_headroom_db: float = 18,
loudness_headroom_db: float = 14,
loudness_compressor: bool = False,
log_clipping: bool = True,
make_parent_dir: bool = True,
add_suffix: bool = True,
) -> Path
Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
stem_name |
Union[str, Path] |
(required) | Base filename without extension. The appropriate extension is appended automatically based on format.
|
wav |
torch.Tensor |
(required) | Audio data tensor. Expected shape [C, T] where C is channels and T is samples. Also accepts 1D tensors (mono), which are automatically unsqueezed.
|
sample_rate |
int |
(required) | Sample rate of the audio data in Hz (e.g., 32000). |
format |
str |
'wav' |
Output format. Supported values: 'wav', 'mp3', 'ogg', 'flac'.
|
mp3_rate |
int |
320 |
Bitrate in kbps for MP3 encoding. |
ogg_rate |
Optional[int] |
None |
Bitrate in kbps for OGG/Vorbis encoding. If None, FFmpeg chooses automatically.
|
normalize |
bool |
True |
If True, actively normalizes the audio to the target level. If False, normalization is only applied if clipping would otherwise occur.
|
strategy |
str |
'peak' |
Normalization strategy. Options: 'clip', 'peak', 'rms', 'loudness'.
|
peak_clip_headroom_db |
float |
1 |
Headroom in dB below 0 dBFS for peak and clip strategies. |
rms_headroom_db |
float |
18 |
Target RMS headroom in dB for the RMS normalization strategy. |
loudness_headroom_db |
float |
14 |
Target loudness in dB for the loudness normalization strategy. |
loudness_compressor |
bool |
False |
If True, uses tanh-based soft clipping for loudness strategy to avoid hard clipping.
|
log_clipping |
bool |
True |
If True, logs a warning to stderr when clipping occurs despite the normalization strategy.
|
make_parent_dir |
bool |
True |
If True, creates parent directories if they do not exist.
|
add_suffix |
bool |
True |
If True, appends the format extension to the stem name.
|
Return Value
| Type | Description |
|---|---|
Path |
Path to the saved audio file, including the extension (e.g., Path('output/song.wav')).
|
Source Location
- File:
audiocraft/data/audio.py, lines 159-231 - Import:
from audiocraft.data.audio import audio_write
Internal Workflow
The function proceeds through four stages:
Stage 1: Input Validation
- Asserts the input tensor is floating-point.
- Handles 1D input by unsqueezing to
[1, T]. - Rejects tensors with more than 2 dimensions.
- Asserts all values are finite (no NaN or Inf).
Stage 2: Normalization
Calls normalize_audio() from audiocraft/data/audio_utils.py with the specified strategy and parameters. This function applies the selected normalization algorithm and returns the normalized tensor.
Stage 3: Format Configuration
Selects the FFmpeg flags based on the output format:
# WAV: uncompressed 16-bit PCM
flags = ['-f', 'wav', '-c:a', 'pcm_s16le']
# MP3: LAME encoder at specified bitrate
flags = ['-f', 'mp3', '-c:a', 'libmp3lame', '-b:a', f'{mp3_rate}k']
# OGG: Vorbis encoder with optional bitrate
flags = ['-f', 'ogg', '-c:a', 'libvorbis']
# FLAC: lossless compression
flags = ['-f', 'flac']
Stage 4: FFmpeg Encoding
Calls _piping_to_ffmpeg() which:
- Constructs an FFmpeg command with the appropriate input format flags (
f32le, sample rate, channel count). - Converts the tensor to 32-bit float PCM bytes:
f32_pcm(wav).t().detach().cpu().numpy().tobytes(). - Pipes the raw bytes to FFmpeg's stdin.
- FFmpeg encodes and writes the output file.
If encoding fails, any partially written file is cleaned up (deleted) before re-raising the exception.
Example Usage
from audiocraft.models import MusicGen
from audiocraft.data.audio import audio_write
model = MusicGen.get_pretrained('facebook/musicgen-melody')
model.set_generation_params(duration=8.0)
wav = model.generate(['upbeat electronic dance music'])
# Save as WAV with peak normalization
path = audio_write(
'output/generated_track',
wav[0].cpu(),
model.sample_rate,
format='wav',
strategy='peak',
)
print(f"Saved to: {path}")
# Output: Saved to: output/generated_track.wav
# Save as high-quality MP3
path = audio_write(
'output/generated_track',
wav[0].cpu(),
model.sample_rate,
format='mp3',
mp3_rate=320,
strategy='loudness',
loudness_headroom_db=14,
)
Dependencies
soundfile- Audio file metadata (used elsewhere in the module; not directly used byaudio_write)av- PyAV library (used for reading; writing uses FFmpeg subprocess)- FFmpeg (system binary) - Required for all audio encoding. Must be installed and available on the system PATH.
torch- Tensor operations and CPU transfer
Error Handling
If the FFmpeg encoding process fails:
- The function checks if a partial output file was created.
- If so, it deletes the partial file to avoid leaving corrupt files on disk.
- The original exception is re-raised.
Related Pages
- Principle:Facebookresearch_Audiocraft_Audio_File_Writing
- Implementation:Facebookresearch_Audiocraft_EncodecModel_decode - Produces the audio tensor that this function writes to disk.
- Implementation:Facebookresearch_Audiocraft_Audiocraft_Installation - FFmpeg and other system dependencies must be installed.
- Heuristic:Facebookresearch_Audiocraft_Audio_Normalization_Strategies