Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Datajuicer Data juicer VideoSplitByDurationMapper

From Leeroopedia
Knowledge Sources
Domains Data_Processing, Mapping
Last Updated 2026-02-14 16:00 GMT

Overview

Concrete tool for splitting videos into fixed-duration segments provided by Data-Juicer.

Description

VideoSplitByDurationMapper is a mapper operator that splits videos into fixed-duration segments, creating multiple shorter clips from each input video to produce uniform-length training samples. It divides each video into segments of a configurable duration (default: 10 seconds), discards the last segment if shorter than a minimum threshold, saves split clips using either FFmpeg or PyAV backends, and updates the sample's video references and text placeholders to reflect the new segments, with optional original sample preservation.

Usage

Use when you need to create uniform-length video clips required by many video generation and understanding models that expect fixed-duration inputs.

Code Reference

Source Location

Signature

@OPERATORS.register_module("video_split_by_duration_mapper")
class VideoSplitByDurationMapper(Mapper):
    def __init__(self, split_duration: float = 10, min_last_split_duration: float = 0, keep_original_sample: bool = True, save_dir: str = None, video_backend: str = "ffmpeg", ffmpeg_extra_args: str = "", *args, **kwargs):

Import

from data_juicer.ops.mapper.video_split_by_duration_mapper import VideoSplitByDurationMapper

I/O Contract

Inputs

Name Type Required Description
split_duration float No Duration of each video split in seconds (default: 10)
min_last_split_duration float No Minimum allowable duration for the last split; shorter splits are discarded (default: 0)
keep_original_sample bool No Whether to keep the original sample (default: True)
save_dir str No Directory for generated video files; if not specified, saves alongside input files
video_backend str No Video backend: "ffmpeg" or "av" (default: "ffmpeg")
ffmpeg_extra_args str No Extra FFmpeg args for splitting video (default: "")

Outputs

Name Type Description
samples Dict Transformed samples with split video segment file paths

Usage Examples

process:
  - video_split_by_duration_mapper:
      split_duration: 10
      min_last_split_duration: 3
      keep_original_sample: false

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment