Implementation:Datajuicer Data juicer VideoSplitBySceneMapper
| Knowledge Sources | |
|---|---|
| Domains | Data_Processing, Mapping |
| Last Updated | 2026-02-14 16:00 GMT |
Overview
Concrete tool for splitting videos into scene clips based on detected scene changes provided by Data-Juicer.
Description
VideoSplitBySceneMapper is a mapper operator that splits videos into individual scene clips based on detected scene changes, using content-aware analysis to identify natural visual boundaries. It uses the scenedetect library with configurable detectors (ContentDetector, ThresholdDetector, or AdaptiveDetector), a threshold parameter, and minimum scene length to identify scene boundaries, then splits the video at those boundaries using FFmpeg, saving individual scene clips and updating the sample's video and text references.
Usage
Use when you need the most semantically meaningful video splitting approach by detecting actual scene changes rather than relying on fixed durations or codec-level keyframes, producing coherent single-scene clips ideal for training.
Code Reference
Source Location
- Repository: Datajuicer_Data_juicer
- File: data_juicer/ops/mapper/video_split_by_scene_mapper.py
Signature
@OPERATORS.register_module("video_split_by_scene_mapper")
class VideoSplitBySceneMapper(Mapper):
def __init__(self, detector: str = "ContentDetector", threshold: NonNegativeFloat = 27.0, min_scene_len: NonNegativeInt = 15, show_progress: bool = False, save_dir: str = None, save_field: str = None, ffmpeg_extra_args: str = "-movflags frag_keyframe+empty_moov", output_format: str = "path", *args, **kwargs):
Import
from data_juicer.ops.mapper.video_split_by_scene_mapper import VideoSplitBySceneMapper
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| detector | str | No | Scene detector algorithm: "ContentDetector", "ThresholdDetector", or "AdaptiveDetector" (default: "ContentDetector") |
| threshold | NonNegativeFloat | No | Threshold passed to the scene detector (default: 27.0) |
| min_scene_len | NonNegativeInt | No | Minimum length of any scene in frames (default: 15) |
| show_progress | bool | No | Whether to show progress from scenedetect (default: False) |
| save_dir | str | No | Directory for generated video files; if not specified, saves alongside input files |
| save_field | str | No | New field name for generated video paths; if not specified, overwrites original field |
| ffmpeg_extra_args | str | No | Extra FFmpeg args for splitting (default: "-movflags frag_keyframe+empty_moov") |
| output_format | str | No | Output format: "path" or "bytes" (default: "path") |
Outputs
| Name | Type | Description |
|---|---|---|
| samples | Dict | Transformed samples with scene-split video clip file paths or bytes |
Usage Examples
process:
- video_split_by_scene_mapper:
detector: "ContentDetector"
threshold: 27.0
min_scene_len: 15