Implementation:Datajuicer Data juicer VideoDurationFilter
| Knowledge Sources | |
|---|---|
| Domains | Data_Quality, Filtering |
| Last Updated | 2026-02-14 16:00 GMT |
Overview
Concrete tool for filtering data samples based on video duration provided by Data-Juicer.
Description
VideoDurationFilter is a filter operator that keeps samples where the video duration in seconds falls within a specified range. It extends Filter and uses the two-phase compute_stats/process pattern. In compute_stats_single, it creates a video reader (supporting ffmpeg or av backends) to obtain each video's duration, caching results under video_duration. The process_single method checks each duration against thresholds using an 'any' or 'all' strategy across multiple videos per sample. Fundamental video dataset filter for removing clips that are too short or too long.
Usage
Import when filtering based on video duration. Configure in YAML or Python.
Code Reference
Source Location
- Repository: Datajuicer_Data_juicer
- File: data_juicer/ops/filter/video_duration_filter.py
Signature
@OPERATORS.register_module("video_duration_filter")
class VideoDurationFilter(Filter):
def __init__(self, min_duration: float = 0, max_duration: float = sys.maxsize, any_or_all: str = "any", video_backend: str = "ffmpeg", *args, **kwargs):
Import
from data_juicer.ops.filter.video_duration_filter import VideoDurationFilter
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| min_duration | float | No | Minimum video duration in seconds (default: 0) |
| max_duration | float | No | Maximum video duration in seconds (default: sys.maxsize) |
| any_or_all | str | No | Keep strategy: "any" or "all" (default: "any") |
| video_backend | str | No | Video backend: "ffmpeg" or "av" (default: "ffmpeg") |
Outputs
| Name | Type | Description |
|---|---|---|
| samples | Dict | Filtered samples with video_duration stat computed |
Usage Examples
YAML Configuration
process:
- video_duration_filter:
min_duration: 1.0
max_duration: 300.0
Python API
from data_juicer.ops.filter.video_duration_filter import VideoDurationFilter
op = VideoDurationFilter(min_duration=1.0, max_duration=300.0)