Principle:NVIDIA NeMo Curator Video Frame Extraction

Knowledge Sources	NeMo Curator
Domains	Data_Curation, Video_Processing, Computer_Vision
Last Updated	2026-02-14 17:00 GMT

Overview

Technique for extracting individual image frames from video clips at configurable resolutions and frame rates for downstream visual analysis.

Description

Video Frame Extraction converts temporal video data into discrete image frames that can be processed by image-based models (CLIP, aesthetic scorers, captioning models). The extraction supports multiple policies (sequential sampling, keyframe extraction) and decoders (FFmpeg CPU, FFmpeg GPU, PyNvCodec hardware decoder).

Usage

Use frame extraction after clipping and before any stage that requires image data (captioning, embedding, aesthetic filtering).

Theoretical Basis

Frame extraction involves:

Selecting extraction policy (fixed FPS sampling vs keyframe extraction)
Decoding video content using hardware (NVDEC) or software (FFmpeg) decoders
Resizing frames to target resolution for downstream model input
Storing frames as numpy arrays [N, H, W, C] in RGB format

Related Pages

Implemented By

Implementation:NVIDIA_NeMo_Curator_ClipFrameExtractionStage

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment