Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:NVIDIA DALI FFmpeg Scene Processing

From Leeroopedia


Knowledge Sources
Domains Video_Processing, GPU_Computing, Data_Preparation
Last Updated 2026-02-08 00:00 GMT

Overview

Concrete toolchain for preparing video super-resolution training data, provided by the NVIDIA DALI example suite, orchestrating FFmpeg via Python subprocess calls to split raw video into scenes and transcode them at multiple resolutions.

Description

The FFmpeg scene processing toolchain consists of three coordinated components:

prepare_data.sh is the top-level entry point -- a Bash script that accepts a raw video file path as its sole argument and orchestrates the full preparation pipeline. It first invokes split_scenes.py to segment the input video into discrete scenes, then invokes transcode_scenes.py four times to produce 540p, 720p, 1080p, and 4K resolution variants of every scene.

split_scenes.py reads a timestamps file (./data/timestamps) containing scene boundary times and uses FFmpeg's stream copy mode (-c:v copy) to extract each scene as an independent MP4 file. Scenes with index less than 53 are assigned to the training subset; the remainder become validation data. The stream copy approach preserves the original codec and quality without re-encoding overhead.

transcode_scenes.py (referenced as transcode_scenes.py in the repository, performing the downsample_scenes function) takes the split original scenes and re-encodes them at a specified target resolution using configurable codec parameters. For sub-4K resolutions, it applies bilinear downscaling via FFmpeg's scale filter. The codec can be either H.264 (libx264) or HEVC (libx265), with tunable CRF for quality control and keyframe interval for DALI-compatible random access.

Usage

Invoke the preparation pipeline from the video_superres example directory:

./prepare_data.sh /path/to/raw_4k_video.mp4

This creates a data_dir directory containing the hierarchical scene structure at all four resolutions. For custom codec or quality settings, invoke transcode_scenes.py directly with additional arguments.

Code Reference

Source Location

  • Repository: NVIDIA DALI
  • File: docs/examples/use_cases/video_superres/prepare_data.sh (lines 1-16)
  • File: docs/examples/use_cases/video_superres/tools/split_scenes.py (lines 6-31)
  • File: docs/examples/use_cases/video_superres/tools/transcode_scenes.py (lines 9-78)

Signature

Shell entry point:

./prepare_data.sh [FILE]

Scene splitting function:

def split_scenes(raw_data_path, out_data_path):
    """
    Split a raw video into scenes based on timestamps.

    Args:
        raw_data_path: Path to the raw source MP4 video file
        out_data_path: Root output directory for split scenes
    """

Transcoding function:

def downsample_scenes(main_data, resolution, codec, crf, keyint, quiet):
    """
    Transcode split scenes to a target resolution and codec.

    Args:
        main_data: Root data directory containing orig/scenes/
        resolution: Target resolution ("4K", "1080p", "720p", "540p")
        codec: Video codec ("h264" or "hevc")
        crf: Constant Rate Factor for quality (default "18")
        keyint: Keyframe interval (default "4")
        quiet: Suppress ffmpeg output
    """

Import

import argparse
import os
import subprocess

I/O Contract

Inputs

Name Type Required Description
raw_data_path str (file path) Yes Path to the raw 4K MP4 video file to be processed
timestamps file text file (./data/timestamps) Yes File containing one scene boundary timestamp per line, in seconds
resolution str ("4K", "1080p", "720p", "540p") Yes (for transcode) Target resolution for transcoding
codec str ("h264" or "hevc") No Video codec for transcoding; defaults to "h264"
crf str No Constant Rate Factor quality parameter; defaults to "18"
keyint str No Keyframe interval for DALI-compatible random access; defaults to "4"

Outputs

Name Type Description
Scene MP4 files (orig) MP4 video files Stream-copied scene segments in data_dir/orig/scenes/{train,val}/scene_N.mp4
Scene MP4 files (transcoded) MP4 video files Re-encoded scene segments in data_dir/{resolution}/scenes/{train,val}/scene_N.mp4

Usage Examples

Full Pipeline via Shell Script

# Prepare all resolutions from a single 4K source video
./prepare_data.sh /data/raw/myanmar_4k.mp4

Scene Splitting Only

from tools.split_scenes import split_scenes

split_scenes(
    raw_data_path="/data/raw/myanmar_4k.mp4",
    out_data_path="data_dir"
)

Custom Transcoding

from tools.transcode_scenes import downsample_scenes

# Transcode to 720p with HEVC codec and higher quality (lower CRF)
downsample_scenes(
    main_data="data_dir",
    resolution="720p",
    codec="hevc",
    crf="15",
    keyint="4",
    quiet=True
)

Command-Line Transcoding

# Transcode scenes to 540p with default settings
python ./tools/transcode_scenes.py --main_data data_dir --resolution 540p

# Transcode to 1080p with custom codec and CRF
python ./tools/transcode_scenes.py --main_data data_dir --resolution 1080p --codec hevc --crf 15

Related Pages

Implements Principle

Requires Environment

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment