Implementation:NVIDIA NeMo Curator ClipWriterStage
Appearance
| Knowledge Sources | |
|---|---|
| Domains | Data_Curation, Video_Processing, Data_Engineering |
| Last Updated | 2026-02-14 17:00 GMT |
Overview
Concrete tool for writing video clips, metadata, embeddings, and previews to structured storage provided by NeMo Curator.
Description
The ClipWriterStage writes all artifacts from a fully-processed VideoTask to organized storage directories. It supports parallel writing via thread pool, configurable output selection (clips, embeddings, previews, captions), and deterministic naming via content hashing.
Usage
Import this stage as the final step in a video curation pipeline.
Code Reference
Source Location
- Repository: NeMo Curator
- File: nemo_curator/stages/video/io/clip_writer.py
- Lines: L34-429
Signature
@dataclass
class ClipWriterStage(ProcessingStage[VideoTask, VideoTask]):
output_path: str
input_path: str
upload_clips: bool
dry_run: bool
generate_embeddings: bool
generate_previews: bool
generate_captions: bool
embedding_algorithm: str = "cosmos-embed1"
caption_models: list[str] | None = None
enhanced_caption_models: list[str] | None = None
verbose: bool = False
max_workers: int = 6
log_stats: bool = False
name: str = "clip_writer"
Import
from nemo_curator.stages.video.io.clip_writer import ClipWriterStage
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| task | VideoTask | Yes | Fully processed video with clips, embeddings, captions |
Outputs
| Name | Type | Description |
|---|---|---|
| task | VideoTask | Same task after writing artifacts to storage |
Usage Examples
from nemo_curator.stages.video.io.clip_writer import ClipWriterStage
writer = ClipWriterStage(
output_path="./output/curated_videos",
input_path="./data/raw_videos",
upload_clips=True,
dry_run=False,
generate_embeddings=True,
generate_previews=True,
generate_captions=True,
max_workers=6,
)
Related Pages
Implements Principle
Page Connections
Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment