Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:NVIDIA NeMo Curator VideoReaderStage

From Leeroopedia
Revision as of 13:22, 16 February 2026 by Admin (talk | contribs) (Auto-imported from implementations/NVIDIA_NeMo_Curator_VideoReaderStage.md)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Knowledge Sources
Domains Data_Curation, Video_Processing
Last Updated 2026-02-14 17:00 GMT

Overview

Concrete tool for reading video files and extracting metadata provided by NeMo Curator.

Description

The VideoReaderStage reads video files from the local filesystem and extracts comprehensive metadata including dimensions, frame rate, duration, codecs, and other technical properties. It stores results in a VideoTask object containing the video source bytes and metadata.

Usage

Import this stage when building a video curation pipeline that needs to read raw video files from storage. Combine with FilePartitioningStage via the VideoReader composite stage for a complete ingestion solution.

Code Reference

Source Location

  • Repository: NeMo Curator
  • File: nemo_curator/stages/video/io/video_reader.py
  • Lines: L79-290

Signature

@dataclass
class VideoReaderStage(ProcessingStage[FileGroupTask, VideoTask]):
    input_path: str | None = None
    verbose: bool = False
    name: str = "video_reader"

Import

from nemo_curator.stages.video.io.video_reader import VideoReaderStage

I/O Contract

Inputs

Name Type Required Description
task FileGroupTask Yes List of video file paths to read

Outputs

Name Type Description
task VideoTask Contains video.source_bytes and video.metadata (duration, fps, resolution, codec)

Usage Examples

from nemo_curator.stages.video.io.video_reader import VideoReader
from nemo_curator.pipeline import Pipeline

reader = VideoReader(
    input_video_path="./data/videos",
    video_limit=100,
    verbose=True,
)

pipeline = Pipeline()
pipeline.add_stage(reader)

Related Pages

Implements Principle

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment