Implementation:NVIDIA DALI FFmpeg Scene Processing
| Knowledge Sources | |
|---|---|
| Domains | Video_Processing, GPU_Computing, Data_Preparation |
| Last Updated | 2026-02-08 00:00 GMT |
Overview
Concrete toolchain for preparing video super-resolution training data, provided by the NVIDIA DALI example suite, orchestrating FFmpeg via Python subprocess calls to split raw video into scenes and transcode them at multiple resolutions.
Description
The FFmpeg scene processing toolchain consists of three coordinated components:
prepare_data.sh is the top-level entry point -- a Bash script that accepts a raw video file path as its sole argument and orchestrates the full preparation pipeline. It first invokes split_scenes.py to segment the input video into discrete scenes, then invokes transcode_scenes.py four times to produce 540p, 720p, 1080p, and 4K resolution variants of every scene.
split_scenes.py reads a timestamps file (./data/timestamps) containing scene boundary times and uses FFmpeg's stream copy mode (-c:v copy) to extract each scene as an independent MP4 file. Scenes with index less than 53 are assigned to the training subset; the remainder become validation data. The stream copy approach preserves the original codec and quality without re-encoding overhead.
transcode_scenes.py (referenced as transcode_scenes.py in the repository, performing the downsample_scenes function) takes the split original scenes and re-encodes them at a specified target resolution using configurable codec parameters. For sub-4K resolutions, it applies bilinear downscaling via FFmpeg's scale filter. The codec can be either H.264 (libx264) or HEVC (libx265), with tunable CRF for quality control and keyframe interval for DALI-compatible random access.
Usage
Invoke the preparation pipeline from the video_superres example directory:
./prepare_data.sh /path/to/raw_4k_video.mp4
This creates a data_dir directory containing the hierarchical scene structure at all four resolutions. For custom codec or quality settings, invoke transcode_scenes.py directly with additional arguments.
Code Reference
Source Location
- Repository: NVIDIA DALI
- File: docs/examples/use_cases/video_superres/prepare_data.sh (lines 1-16)
- File: docs/examples/use_cases/video_superres/tools/split_scenes.py (lines 6-31)
- File: docs/examples/use_cases/video_superres/tools/transcode_scenes.py (lines 9-78)
Signature
Shell entry point:
./prepare_data.sh [FILE]
Scene splitting function:
def split_scenes(raw_data_path, out_data_path):
"""
Split a raw video into scenes based on timestamps.
Args:
raw_data_path: Path to the raw source MP4 video file
out_data_path: Root output directory for split scenes
"""
Transcoding function:
def downsample_scenes(main_data, resolution, codec, crf, keyint, quiet):
"""
Transcode split scenes to a target resolution and codec.
Args:
main_data: Root data directory containing orig/scenes/
resolution: Target resolution ("4K", "1080p", "720p", "540p")
codec: Video codec ("h264" or "hevc")
crf: Constant Rate Factor for quality (default "18")
keyint: Keyframe interval (default "4")
quiet: Suppress ffmpeg output
"""
Import
import argparse
import os
import subprocess
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| raw_data_path | str (file path) | Yes | Path to the raw 4K MP4 video file to be processed |
| timestamps file | text file (./data/timestamps) | Yes | File containing one scene boundary timestamp per line, in seconds |
| resolution | str ("4K", "1080p", "720p", "540p") | Yes (for transcode) | Target resolution for transcoding |
| codec | str ("h264" or "hevc") | No | Video codec for transcoding; defaults to "h264" |
| crf | str | No | Constant Rate Factor quality parameter; defaults to "18" |
| keyint | str | No | Keyframe interval for DALI-compatible random access; defaults to "4" |
Outputs
| Name | Type | Description |
|---|---|---|
| Scene MP4 files (orig) | MP4 video files | Stream-copied scene segments in data_dir/orig/scenes/{train,val}/scene_N.mp4 |
| Scene MP4 files (transcoded) | MP4 video files | Re-encoded scene segments in data_dir/{resolution}/scenes/{train,val}/scene_N.mp4 |
Usage Examples
Full Pipeline via Shell Script
# Prepare all resolutions from a single 4K source video
./prepare_data.sh /data/raw/myanmar_4k.mp4
Scene Splitting Only
from tools.split_scenes import split_scenes
split_scenes(
raw_data_path="/data/raw/myanmar_4k.mp4",
out_data_path="data_dir"
)
Custom Transcoding
from tools.transcode_scenes import downsample_scenes
# Transcode to 720p with HEVC codec and higher quality (lower CRF)
downsample_scenes(
main_data="data_dir",
resolution="720p",
codec="hevc",
crf="15",
keyint="4",
quiet=True
)
Command-Line Transcoding
# Transcode scenes to 540p with default settings
python ./tools/transcode_scenes.py --main_data data_dir --resolution 540p
# Transcode to 1080p with custom codec and CRF
python ./tools/transcode_scenes.py --main_data data_dir --resolution 1080p --codec hevc --crf 15