Implementation:Datajuicer Data juicer VideoSplitByDurationMapper
| Knowledge Sources | |
|---|---|
| Domains | Data_Processing, Mapping |
| Last Updated | 2026-02-14 16:00 GMT |
Overview
Concrete tool for splitting videos into fixed-duration segments provided by Data-Juicer.
Description
VideoSplitByDurationMapper is a mapper operator that splits videos into fixed-duration segments, creating multiple shorter clips from each input video to produce uniform-length training samples. It divides each video into segments of a configurable duration (default: 10 seconds), discards the last segment if shorter than a minimum threshold, saves split clips using either FFmpeg or PyAV backends, and updates the sample's video references and text placeholders to reflect the new segments, with optional original sample preservation.
Usage
Use when you need to create uniform-length video clips required by many video generation and understanding models that expect fixed-duration inputs.
Code Reference
Source Location
- Repository: Datajuicer_Data_juicer
- File: data_juicer/ops/mapper/video_split_by_duration_mapper.py
Signature
@OPERATORS.register_module("video_split_by_duration_mapper")
class VideoSplitByDurationMapper(Mapper):
def __init__(self, split_duration: float = 10, min_last_split_duration: float = 0, keep_original_sample: bool = True, save_dir: str = None, video_backend: str = "ffmpeg", ffmpeg_extra_args: str = "", *args, **kwargs):
Import
from data_juicer.ops.mapper.video_split_by_duration_mapper import VideoSplitByDurationMapper
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| split_duration | float | No | Duration of each video split in seconds (default: 10) |
| min_last_split_duration | float | No | Minimum allowable duration for the last split; shorter splits are discarded (default: 0) |
| keep_original_sample | bool | No | Whether to keep the original sample (default: True) |
| save_dir | str | No | Directory for generated video files; if not specified, saves alongside input files |
| video_backend | str | No | Video backend: "ffmpeg" or "av" (default: "ffmpeg") |
| ffmpeg_extra_args | str | No | Extra FFmpeg args for splitting video (default: "") |
Outputs
| Name | Type | Description |
|---|---|---|
| samples | Dict | Transformed samples with split video segment file paths |
Usage Examples
process:
- video_split_by_duration_mapper:
split_duration: 10
min_last_split_duration: 3
keep_original_sample: false