Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:NVIDIA NeMo Curator Audio Common Stages

From Leeroopedia
Knowledge Sources
Domains Audio Processing, Data Curation
Last Updated 2026-02-14 00:00 GMT

Overview

Provides a legacy base class for audio processing stages and two common audio curation stages: duration computation (GetAudioDurationStage) and value-based filtering (PreserveByValueStage).

Description

This module contains the foundation classes for the audio curation pipeline:

  • LegacySpeechStage -- An abstract base class that extends ProcessingStage[Task, Task] and adapts the older SDP (Speech Data Processor) BaseParallelProcessor pattern. Its process() method iterates over each entry in an AudioBatch.data list, calling the abstract process_dataset_entry() method on each entry individually. It also propagates _stage_perf metadata from the batch to each result entry.
  • GetAudioDurationStage -- A dataclass-based stage that reads audio files via soundfile.read(), computes the duration as sample_count / sample_rate, and stores the result in a configurable duration_key. If a SoundFileError occurs, the duration is set to -1.0 and a warning is logged.
  • PreserveByValueStage -- A filtering stage that compares a field value in each dataset entry against a target value using a configurable comparison operator (lt, le, eq, ne, ge, gt). Entries satisfying the condition are preserved; others are dropped (empty list returned).

Usage

Use LegacySpeechStage as a base class when building audio processing stages that operate on individual entries within an AudioBatch. Use GetAudioDurationStage to compute audio file durations in a pipeline. Use PreserveByValueStage to filter audio entries based on field values (for example, filtering out entries shorter than a minimum duration).

Code Reference

Source Location

  • Repository: NeMo-Curator
  • File: nemo_curator/stages/audio/common.py
  • Lines: 1-121

Signature

class LegacySpeechStage(ProcessingStage[Task, Task]):
    def process(self, task: AudioBatch) -> list[Task]: ...
    @abstractmethod
    def process_dataset_entry(self, data_entry: AudioBatch) -> list[AudioBatch]: ...


@dataclass
class GetAudioDurationStage(LegacySpeechStage):
    name = "GetAudioDurationStage"
    audio_filepath_key: str
    duration_key: str
    def process_dataset_entry(self, data_entry: dict) -> list[AudioBatch]: ...


class PreserveByValueStage(LegacySpeechStage):
    name = "PreserveByValueStage"
    def __init__(self, input_value_key: str, target_value: int | str, operator: str = "eq"): ...
    def process_dataset_entry(self, data_entry: AudioBatch) -> list[AudioBatch]: ...

Import

from nemo_curator.stages.audio.common import (
    LegacySpeechStage,
    GetAudioDurationStage,
    PreserveByValueStage,
)

I/O Contract

LegacySpeechStage

Inputs

Name Type Required Description
task AudioBatch Yes An AudioBatch containing a list of data entries to process individually

Outputs

Name Type Description
result list[Task] Aggregated list of Task objects from all individual entry processing calls

GetAudioDurationStage

Inputs

Name Type Required Description
audio_filepath_key str Yes Key to retrieve the path to the audio file from the data entry
duration_key str Yes Key under which the computed duration will be stored

Outputs

Name Type Description
data_entry AudioBatch The input entry augmented with the duration value (or -1.0 on error)

PreserveByValueStage

Inputs

Name Type Required Description
input_value_key str Yes The field in data entries to evaluate
target_value int or str Yes The value to compare against
operator str No Comparison operator: "lt", "le", "eq" (default), "ne", "ge", "gt"

Outputs

Name Type Description
result list[AudioBatch] Single-element list with the entry if condition is met, empty list otherwise

Usage Examples

Computing Audio Duration

from nemo_curator.stages.audio.common import GetAudioDurationStage

duration_stage = GetAudioDurationStage(
    audio_filepath_key="audio_filepath",
    duration_key="duration",
)

Filtering by Value

from nemo_curator.stages.audio.common import PreserveByValueStage

# Keep only entries where duration is greater than or equal to 1.0
filter_stage = PreserveByValueStage(
    input_value_key="duration",
    target_value=1.0,
    operator="ge",
)

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment