Environment:NVIDIA NeMo Curator Video Codec Stack
| Knowledge Sources | |
|---|---|
| Domains | Infrastructure, Video_Processing, GPU_Computing |
| Last Updated | 2026-02-14 16:45 GMT |
Overview
NVIDIA Video Codec SDK stack (PyNvVideoCodec, CV-CUDA, PyCUDA) with flash-attn for GPU-accelerated video decoding and processing.
Description
This environment provides GPU-accelerated video decoding and frame processing capabilities. PyNvVideoCodec uses NVIDIA's hardware NVDEC/NVENC units for video decode/encode, CV-CUDA provides GPU-accelerated computer vision operations (color conversion, resizing), and PyCUDA provides low-level CUDA stream management. Flash Attention is included for efficient attention computation in vision-language models used for video captioning.
Usage
Required for the Video Curation Pipeline when GPU acceleration is desired. The `VideoBatchDecoder` class uses these libraries for hardware-accelerated video frame extraction. Without this stack, video processing falls back to CPU-based decoding via `av` (PyAV) and OpenCV.
System Requirements
| Category | Requirement | Notes |
|---|---|---|
| OS | Linux | All video codec libs are Linux-only |
| Architecture | x86_64 | PyNvVideoCodec and flash-attn are x86_64-only |
| Hardware | NVIDIA GPU with NVDEC/NVENC | Hardware video decode/encode units |
| VRAM | 10-20GB per worker | TransNetV2 needs ~10GB, Cosmos-Embed1 needs ~20GB |
| CUDA | CUDA 12.x | Required by cvcuda_cu12 and PyNvVideoCodec |
| PyTorch | torch <= 2.9.1 | Upper-bounded for video_cuda12 compatibility |
Dependencies
Python Packages
- `PyNvVideoCodec` == 2.0.2 (x86_64 Linux only)
- `cvcuda_cu12` (CV-CUDA for CUDA 12)
- `pycuda`
- `flash-attn` <= 2.8.3 (x86_64 Linux only)
- `torch` <= 2.9.1
- `torchaudio`
- `av` == 13.1.0 (CPU fallback video I/O)
- `opencv-python`
- `torchvision`
- `einops`
- `easydict`
Credentials
No credentials required for the video codec stack itself. Vision-language models for captioning may require `HF_TOKEN` for gated model access.
Quick Install
# Install NeMo Curator with full video curation support
pip install "nemo-curator[video_cuda12]"
Code Evidence
Optional import with graceful degradation from `nemo_curator/utils/nvcodec_utils.py:23-39`:
try:
import cvcuda
import nvcv
import pycuda.driver as cuda
import PyNvVideoCodec as Nvc
pixel_format_to_cvcuda_code = {
Nvc.Pixel_Format.YUV444: cvcuda.ColorConversion.YUV2RGB,
Nvc.Pixel_Format.NV12: cvcuda.ColorConversion.YUV2RGB_NV12,
}
except (ImportError, RuntimeError):
logger.warning("PyNvVideoCodec is not installed, some features will be disabled.")
Nvc = None
cvcuda = None
nvcv = None
cuda = None
pixel_format_to_cvcuda_code = {}
Platform constraints from `pyproject.toml:150-153`:
"flash-attn<=2.8.3; (platform_machine == 'x86_64' and platform_system != 'Darwin')",
"pycuda",
"PyNvVideoCodec==2.0.2; (platform_machine == 'x86_64' and platform_system != 'Darwin')",
"torch<=2.9.1",
Common Errors
| Error Message | Cause | Solution |
|---|---|---|
| `PyNvVideoCodec is not installed, some features will be disabled.` | PyNvVideoCodec not installed | `pip install PyNvVideoCodec==2.0.2` (x86_64 Linux only) |
| `ImportError: pycuda` | PyCUDA not installed | `pip install pycuda` |
| `RuntimeError` during cvcuda import | CUDA driver mismatch | Ensure CUDA 12 driver and toolkit are installed |
| flash-attn build failure | Missing CUDA dev tools for compilation | Install CUDA toolkit dev packages; use `--no-build-isolation` |
Compatibility Notes
- CPU fallback: When PyNvVideoCodec is unavailable, video processing falls back to CPU-based decoding via PyAV (`av` library) and OpenCV. This is significantly slower but functional.
- ARM (aarch64): PyNvVideoCodec and flash-attn are not available on ARM. CPU fallback is used automatically.
- macOS: Not supported. All video codec libraries are Linux-only.
- Build isolation: `flash-attn` requires `no-build-isolation` (configured in `pyproject.toml`).
Related Pages
- Implementation:NVIDIA_NeMo_Curator_VideoReaderStage
- Implementation:NVIDIA_NeMo_Curator_TransNetV2ClipExtractionStage
- Implementation:NVIDIA_NeMo_Curator_ClipFrameExtractionStage
- Implementation:NVIDIA_NeMo_Curator_ClipWriterStage
- Implementation:NVIDIA_NeMo_Curator_CaptionGenerationStage
- Implementation:NVIDIA_NeMo_Curator_CosmosEmbed1EmbeddingStage
- Implementation:NVIDIA_NeMo_Curator_MotionFilterStage