Overview
Provides a legacy base class for audio processing stages and two common audio curation stages: duration computation (GetAudioDurationStage) and value-based filtering (PreserveByValueStage).
Description
This module contains the foundation classes for the audio curation pipeline:
- LegacySpeechStage -- An abstract base class that extends
ProcessingStage[Task, Task] and adapts the older SDP (Speech Data Processor) BaseParallelProcessor pattern. Its process() method iterates over each entry in an AudioBatch.data list, calling the abstract process_dataset_entry() method on each entry individually. It also propagates _stage_perf metadata from the batch to each result entry.
- GetAudioDurationStage -- A dataclass-based stage that reads audio files via
soundfile.read(), computes the duration as sample_count / sample_rate, and stores the result in a configurable duration_key. If a SoundFileError occurs, the duration is set to -1.0 and a warning is logged.
- PreserveByValueStage -- A filtering stage that compares a field value in each dataset entry against a target value using a configurable comparison operator (
lt, le, eq, ne, ge, gt). Entries satisfying the condition are preserved; others are dropped (empty list returned).
Usage
Use LegacySpeechStage as a base class when building audio processing stages that operate on individual entries within an AudioBatch. Use GetAudioDurationStage to compute audio file durations in a pipeline. Use PreserveByValueStage to filter audio entries based on field values (for example, filtering out entries shorter than a minimum duration).
Code Reference
Source Location
- Repository: NeMo-Curator
- File: nemo_curator/stages/audio/common.py
- Lines: 1-121
Signature
class LegacySpeechStage(ProcessingStage[Task, Task]):
def process(self, task: AudioBatch) -> list[Task]: ...
@abstractmethod
def process_dataset_entry(self, data_entry: AudioBatch) -> list[AudioBatch]: ...
@dataclass
class GetAudioDurationStage(LegacySpeechStage):
name = "GetAudioDurationStage"
audio_filepath_key: str
duration_key: str
def process_dataset_entry(self, data_entry: dict) -> list[AudioBatch]: ...
class PreserveByValueStage(LegacySpeechStage):
name = "PreserveByValueStage"
def __init__(self, input_value_key: str, target_value: int | str, operator: str = "eq"): ...
def process_dataset_entry(self, data_entry: AudioBatch) -> list[AudioBatch]: ...
Import
from nemo_curator.stages.audio.common import (
LegacySpeechStage,
GetAudioDurationStage,
PreserveByValueStage,
)
I/O Contract
LegacySpeechStage
Inputs
| Name |
Type |
Required |
Description
|
| task |
AudioBatch |
Yes |
An AudioBatch containing a list of data entries to process individually
|
Outputs
| Name |
Type |
Description
|
| result |
list[Task] |
Aggregated list of Task objects from all individual entry processing calls
|
GetAudioDurationStage
Inputs
| Name |
Type |
Required |
Description
|
| audio_filepath_key |
str |
Yes |
Key to retrieve the path to the audio file from the data entry
|
| duration_key |
str |
Yes |
Key under which the computed duration will be stored
|
Outputs
| Name |
Type |
Description
|
| data_entry |
AudioBatch |
The input entry augmented with the duration value (or -1.0 on error)
|
PreserveByValueStage
Inputs
| Name |
Type |
Required |
Description
|
| input_value_key |
str |
Yes |
The field in data entries to evaluate
|
| target_value |
int or str |
Yes |
The value to compare against
|
| operator |
str |
No |
Comparison operator: "lt", "le", "eq" (default), "ne", "ge", "gt"
|
Outputs
| Name |
Type |
Description
|
| result |
list[AudioBatch] |
Single-element list with the entry if condition is met, empty list otherwise
|
Usage Examples
Computing Audio Duration
from nemo_curator.stages.audio.common import GetAudioDurationStage
duration_stage = GetAudioDurationStage(
audio_filepath_key="audio_filepath",
duration_key="duration",
)
Filtering by Value
from nemo_curator.stages.audio.common import PreserveByValueStage
# Keep only entries where duration is greater than or equal to 1.0
filter_stage = PreserveByValueStage(
input_value_key="duration",
target_value=1.0,
operator="ge",
)
Related Pages
Page Connections
Double-click a node to navigate. Hold to expand connections.