Implementation:NVIDIA NeMo Curator Audio Common Stages

Knowledge Sources	NVIDIA NeMo Curator
Domains	Audio Processing, Data Curation
Last Updated	2026-02-14 00:00 GMT

Overview

Provides a legacy base class for audio processing stages and two common audio curation stages: duration computation (GetAudioDurationStage) and value-based filtering (PreserveByValueStage).

Description

This module contains the foundation classes for the audio curation pipeline:

LegacySpeechStage -- An abstract base class that extends ProcessingStage[Task, Task] and adapts the older SDP (Speech Data Processor) BaseParallelProcessor pattern. Its process() method iterates over each entry in an AudioBatch.data list, calling the abstract process_dataset_entry() method on each entry individually. It also propagates _stage_perf metadata from the batch to each result entry.

GetAudioDurationStage -- A dataclass-based stage that reads audio files via soundfile.read(), computes the duration as sample_count / sample_rate, and stores the result in a configurable duration_key. If a SoundFileError occurs, the duration is set to -1.0 and a warning is logged.

PreserveByValueStage -- A filtering stage that compares a field value in each dataset entry against a target value using a configurable comparison operator (lt, le, eq, ne, ge, gt). Entries satisfying the condition are preserved; others are dropped (empty list returned).

Usage

Use LegacySpeechStage as a base class when building audio processing stages that operate on individual entries within an AudioBatch. Use GetAudioDurationStage to compute audio file durations in a pipeline. Use PreserveByValueStage to filter audio entries based on field values (for example, filtering out entries shorter than a minimum duration).

Code Reference

Source Location

Repository: NeMo-Curator
File: nemo_curator/stages/audio/common.py
Lines: 1-121

Signature

class LegacySpeechStage(ProcessingStage[Task, Task]):
    def process(self, task: AudioBatch) -> list[Task]: ...
    @abstractmethod
    def process_dataset_entry(self, data_entry: AudioBatch) -> list[AudioBatch]: ...


@dataclass
class GetAudioDurationStage(LegacySpeechStage):
    name = "GetAudioDurationStage"
    audio_filepath_key: str
    duration_key: str
    def process_dataset_entry(self, data_entry: dict) -> list[AudioBatch]: ...


class PreserveByValueStage(LegacySpeechStage):
    name = "PreserveByValueStage"
    def __init__(self, input_value_key: str, target_value: int | str, operator: str = "eq"): ...
    def process_dataset_entry(self, data_entry: AudioBatch) -> list[AudioBatch]: ...

Import

from nemo_curator.stages.audio.common import (
    LegacySpeechStage,
    GetAudioDurationStage,
    PreserveByValueStage,
)

I/O Contract

LegacySpeechStage

Inputs

Name	Type	Required	Description
task	AudioBatch	Yes	An AudioBatch containing a list of data entries to process individually

Outputs

Name	Type	Description
result	list[Task]	Aggregated list of Task objects from all individual entry processing calls

GetAudioDurationStage

Inputs

Name	Type	Required	Description
audio_filepath_key	str	Yes	Key to retrieve the path to the audio file from the data entry
duration_key	str	Yes	Key under which the computed duration will be stored

Outputs

Name	Type	Description
data_entry	AudioBatch	The input entry augmented with the duration value (or -1.0 on error)

PreserveByValueStage

Inputs

Name	Type	Required	Description
input_value_key	str	Yes	The field in data entries to evaluate
target_value	int or str	Yes	The value to compare against
operator	str	No	Comparison operator: "lt", "le", "eq" (default), "ne", "ge", "gt"

Outputs

Name	Type	Description
result	list[AudioBatch]	Single-element list with the entry if condition is met, empty list otherwise

Usage Examples

Computing Audio Duration

from nemo_curator.stages.audio.common import GetAudioDurationStage

duration_stage = GetAudioDurationStage(
    audio_filepath_key="audio_filepath",
    duration_key="duration",
)

Filtering by Value

from nemo_curator.stages.audio.common import PreserveByValueStage

# Keep only entries where duration is greater than or equal to 1.0
filter_stage = PreserveByValueStage(
    input_value_key="duration",
    target_value=1.0,
    operator="ge",
)

Related Pages

Environment:NVIDIA_NeMo_Curator_Python_Linux_Base

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment