Implementation:NVIDIA NeMo Curator Audio Convert Stage

Knowledge Sources	NVIDIA NeMo Curator
Domains	Audio Processing, Data Curation, Format Conversion
Last Updated	2026-02-14 00:00 GMT

Overview

Implements AudioToDocumentStage, a processing stage that converts an AudioBatch into a DocumentBatch, bridging the audio and document processing pipelines.

Description

AudioToDocumentStage extends ProcessingStage[AudioBatch, DocumentBatch] and serves as a bridge component between audio-oriented and document-oriented pipeline stages. Its process() method takes the AudioBatch.data (a list of dictionary entries), constructs a pandas DataFrame from it, and wraps the result in a DocumentBatch. The conversion preserves the original task ID, dataset name, and _stage_perf metadata from the input AudioBatch.

This stage enables audio pipeline outputs (such as transcriptions, WER metrics, and duration values) to flow into document-oriented stages such as Parquet writers or text-based filters that expect DocumentBatch inputs.

Usage

Use this stage in an audio curation pipeline when you need to transition from audio-specific stages (which produce AudioBatch objects) to document-processing stages (which consume DocumentBatch objects). A typical placement is after ASR inference and metrics computation, before writing results to Parquet.

Code Reference

Source Location

Repository: NeMo-Curator
File: nemo_curator/stages/audio/io/convert.py
Lines: 1-38

Signature

class AudioToDocumentStage(ProcessingStage[AudioBatch, DocumentBatch]):
    name = "AudioToDocumentStage"

    def process(self, task: AudioBatch) -> list[DocumentBatch]: ...

Import

from nemo_curator.stages.audio.io.convert import AudioToDocumentStage

I/O Contract

Inputs

Name	Type	Required	Description
task	AudioBatch	Yes	An AudioBatch whose `data` attribute is a list of dictionaries containing audio metadata

Outputs

Name	Type	Description
result	list[DocumentBatch]	A single-element list containing a DocumentBatch with a pandas DataFrame constructed from the AudioBatch data

Usage Examples

Basic Usage

from nemo_curator.stages.audio.io.convert import AudioToDocumentStage

# Create the conversion stage
convert_stage = AudioToDocumentStage()

# In a pipeline, place after audio stages and before document stages:
# ... -> ASR stage -> WER stage -> AudioToDocumentStage -> Parquet writer

Manual Conversion

from nemo_curator.stages.audio.io.convert import AudioToDocumentStage
from nemo_curator.tasks import AudioBatch

# Create an AudioBatch with sample data
audio_batch = AudioBatch(
    task_id="task_1",
    dataset_name="my_dataset",
    data=[
        {"audio_filepath": "/data/audio1.wav", "text": "hello world", "duration": 1.5},
        {"audio_filepath": "/data/audio2.wav", "text": "goodbye", "duration": 0.8},
    ],
)

convert_stage = AudioToDocumentStage()
doc_batches = convert_stage.process(audio_batch)
# doc_batches[0].data is a pandas DataFrame with columns:
# audio_filepath, text, duration

Related Pages

Environment:NVIDIA_NeMo_Curator_Python_Linux_Base

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment