Implementation:Datajuicer Data juicer AudioFFmpegWrappedMapper
| Knowledge Sources | |
|---|---|
| Domains | Data_Processing, Mapping |
| Last Updated | 2026-02-14 16:00 GMT |
Overview
Concrete tool for applying FFmpeg audio filters to audio files in a dataset provided by Data-Juicer.
Description
AudioFFmpegWrappedMapper is a mapper operator that wraps FFmpeg audio filters for flexible audio processing. It uses the ffmpeg-python library to apply a specified FFmpeg filter with custom keyword arguments and global arguments to each audio file in a sample. Processed audio files are saved to a configurable output directory, and the sample's source file paths are updated accordingly. If no filter name is provided, the audio files remain unmodified. It extends the Mapper base class.
Usage
Import when you need to apply arbitrary FFmpeg audio filters to audio files without writing custom operator code.
Code Reference
Source Location
- Repository: Datajuicer_Data_juicer
- File: data_juicer/ops/mapper/audio_ffmpeg_wrapped_mapper.py
Signature
@OPERATORS.register_module("audio_ffmpeg_wrapped_mapper")
class AudioFFmpegWrappedMapper(Mapper):
def __init__(self,
filter_name: Optional[str] = None,
filter_kwargs: Optional[Dict] = None,
global_args: Optional[List[str]] = None,
capture_stderr: bool = True,
overwrite_output: bool = True,
save_dir: str = None,
*args, **kwargs):
Import
from data_juicer.ops.mapper.audio_ffmpeg_wrapped_mapper import AudioFFmpegWrappedMapper
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| filter_name | Optional[str] | No | FFmpeg audio filter name to apply. Default: None (no-op) |
| filter_kwargs | Optional[Dict] | No | Keyword arguments passed to the FFmpeg filter. Default: None |
| global_args | Optional[List[str]] | No | List arguments passed to the FFmpeg command-line. Default: None |
| capture_stderr | bool | No | Whether to capture stderr output. Default: True |
| overwrite_output | bool | No | Whether to overwrite existing output files. Default: True |
| save_dir | str | No | Directory to store generated audio files. If not specified, outputs are saved alongside inputs. Can also be set via DJ_PRODUCED_DATA_DIR environment variable |
Outputs
| Name | Type | Description |
|---|---|---|
| samples | Dict | Transformed samples with processed audio file paths updated |
Usage Examples
YAML Configuration
process:
- audio_ffmpeg_wrapped_mapper:
filter_name: afade
filter_kwargs:
type: in
duration: 3