Implementation:NVIDIA NeMo Curator Audio Convert Stage
| Knowledge Sources | |
|---|---|
| Domains | Audio Processing, Data Curation, Format Conversion |
| Last Updated | 2026-02-14 00:00 GMT |
Overview
Implements AudioToDocumentStage, a processing stage that converts an AudioBatch into a DocumentBatch, bridging the audio and document processing pipelines.
Description
AudioToDocumentStage extends ProcessingStage[AudioBatch, DocumentBatch] and serves as a bridge component between audio-oriented and document-oriented pipeline stages. Its process() method takes the AudioBatch.data (a list of dictionary entries), constructs a pandas DataFrame from it, and wraps the result in a DocumentBatch. The conversion preserves the original task ID, dataset name, and _stage_perf metadata from the input AudioBatch.
This stage enables audio pipeline outputs (such as transcriptions, WER metrics, and duration values) to flow into document-oriented stages such as Parquet writers or text-based filters that expect DocumentBatch inputs.
Usage
Use this stage in an audio curation pipeline when you need to transition from audio-specific stages (which produce AudioBatch objects) to document-processing stages (which consume DocumentBatch objects). A typical placement is after ASR inference and metrics computation, before writing results to Parquet.
Code Reference
Source Location
- Repository: NeMo-Curator
- File: nemo_curator/stages/audio/io/convert.py
- Lines: 1-38
Signature
class AudioToDocumentStage(ProcessingStage[AudioBatch, DocumentBatch]):
name = "AudioToDocumentStage"
def process(self, task: AudioBatch) -> list[DocumentBatch]: ...
Import
from nemo_curator.stages.audio.io.convert import AudioToDocumentStage
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| task | AudioBatch | Yes | An AudioBatch whose data attribute is a list of dictionaries containing audio metadata
|
Outputs
| Name | Type | Description |
|---|---|---|
| result | list[DocumentBatch] | A single-element list containing a DocumentBatch with a pandas DataFrame constructed from the AudioBatch data |
Usage Examples
Basic Usage
from nemo_curator.stages.audio.io.convert import AudioToDocumentStage
# Create the conversion stage
convert_stage = AudioToDocumentStage()
# In a pipeline, place after audio stages and before document stages:
# ... -> ASR stage -> WER stage -> AudioToDocumentStage -> Parquet writer
Manual Conversion
from nemo_curator.stages.audio.io.convert import AudioToDocumentStage
from nemo_curator.tasks import AudioBatch
# Create an AudioBatch with sample data
audio_batch = AudioBatch(
task_id="task_1",
dataset_name="my_dataset",
data=[
{"audio_filepath": "/data/audio1.wav", "text": "hello world", "duration": 1.5},
{"audio_filepath": "/data/audio2.wav", "text": "goodbye", "duration": 0.8},
],
)
convert_stage = AudioToDocumentStage()
doc_batches = convert_stage.process(audio_batch)
# doc_batches[0].data is a pandas DataFrame with columns:
# audio_filepath, text, duration