Principle:NVIDIA NeMo Curator Video Frame Extraction
| Knowledge Sources | |
|---|---|
| Domains | Data_Curation, Video_Processing, Computer_Vision |
| Last Updated | 2026-02-14 17:00 GMT |
Overview
Technique for extracting individual image frames from video clips at configurable resolutions and frame rates for downstream visual analysis.
Description
Video Frame Extraction converts temporal video data into discrete image frames that can be processed by image-based models (CLIP, aesthetic scorers, captioning models). The extraction supports multiple policies (sequential sampling, keyframe extraction) and decoders (FFmpeg CPU, FFmpeg GPU, PyNvCodec hardware decoder).
Usage
Use frame extraction after clipping and before any stage that requires image data (captioning, embedding, aesthetic filtering).
Theoretical Basis
Frame extraction involves:
- Selecting extraction policy (fixed FPS sampling vs keyframe extraction)
- Decoding video content using hardware (NVDEC) or software (FFmpeg) decoders
- Resizing frames to target resolution for downstream model input
- Storing frames as numpy arrays [N, H, W, C] in RGB format