Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Datajuicer Data juicer VideoSplitBySceneMapper

From Leeroopedia
Knowledge Sources
Domains Data_Processing, Mapping
Last Updated 2026-02-14 16:00 GMT

Overview

Concrete tool for splitting videos into scene clips based on detected scene changes provided by Data-Juicer.

Description

VideoSplitBySceneMapper is a mapper operator that splits videos into individual scene clips based on detected scene changes, using content-aware analysis to identify natural visual boundaries. It uses the scenedetect library with configurable detectors (ContentDetector, ThresholdDetector, or AdaptiveDetector), a threshold parameter, and minimum scene length to identify scene boundaries, then splits the video at those boundaries using FFmpeg, saving individual scene clips and updating the sample's video and text references.

Usage

Use when you need the most semantically meaningful video splitting approach by detecting actual scene changes rather than relying on fixed durations or codec-level keyframes, producing coherent single-scene clips ideal for training.

Code Reference

Source Location

Signature

@OPERATORS.register_module("video_split_by_scene_mapper")
class VideoSplitBySceneMapper(Mapper):
    def __init__(self, detector: str = "ContentDetector", threshold: NonNegativeFloat = 27.0, min_scene_len: NonNegativeInt = 15, show_progress: bool = False, save_dir: str = None, save_field: str = None, ffmpeg_extra_args: str = "-movflags frag_keyframe+empty_moov", output_format: str = "path", *args, **kwargs):

Import

from data_juicer.ops.mapper.video_split_by_scene_mapper import VideoSplitBySceneMapper

I/O Contract

Inputs

Name Type Required Description
detector str No Scene detector algorithm: "ContentDetector", "ThresholdDetector", or "AdaptiveDetector" (default: "ContentDetector")
threshold NonNegativeFloat No Threshold passed to the scene detector (default: 27.0)
min_scene_len NonNegativeInt No Minimum length of any scene in frames (default: 15)
show_progress bool No Whether to show progress from scenedetect (default: False)
save_dir str No Directory for generated video files; if not specified, saves alongside input files
save_field str No New field name for generated video paths; if not specified, overwrites original field
ffmpeg_extra_args str No Extra FFmpeg args for splitting (default: "-movflags frag_keyframe+empty_moov")
output_format str No Output format: "path" or "bytes" (default: "path")

Outputs

Name Type Description
samples Dict Transformed samples with scene-split video clip file paths or bytes

Usage Examples

process:
  - video_split_by_scene_mapper:
      detector: "ContentDetector"
      threshold: 27.0
      min_scene_len: 15

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment