Implementation:Datajuicer Data juicer VideoMotionScoreFilter
| Knowledge Sources | |
|---|---|
| Domains | Data_Quality, Filtering |
| Last Updated | 2026-02-14 16:00 GMT |
Overview
Concrete tool for filtering data samples based on video motion score (OpenCV) provided by Data-Juicer.
Description
VideoMotionScoreFilter is a filter operator that keeps samples where the average video motion magnitude falls within a specified range. It extends Filter and uses the two-phase compute_stats/process pattern. It opens videos with OpenCV's VideoCapture, samples frames at a configurable FPS, computes dense optical flow between consecutive frames using Farneback's algorithm with configurable parameters (pyr_scale, levels, winsize, etc.), and averages the flow magnitude across all frame pairs. Supports optional relative normalization against frame diagonal length. Caches results under video_motion_score. Can optionally output the optical flow data. Marked as UNFORKABLE due to OpenCV constraints. Serves as the parent class for the RAFT and ptlflow variants.
Usage
Import when filtering based on video motion score. Configure in YAML or Python.
Code Reference
Source Location
- Repository: Datajuicer_Data_juicer
- File: data_juicer/ops/filter/video_motion_score_filter.py
Signature
@OPERATORS.register_module("video_motion_score_filter")
class VideoMotionScoreFilter(Filter):
def __init__(self, min_score: float = 0.25, max_score: float = sys.float_info.max, frame_field: Optional[str] = None, sampling_fps: PositiveFloat = 2, size: Union[PositiveInt, Tuple[PositiveInt], Tuple[PositiveInt, PositiveInt], None] = None, max_size: Optional[PositiveInt] = None, divisible: PositiveInt = 1, relative: bool = False, any_or_all: str = "any", if_output_optical_flow: bool = False, optical_flow_key: str = MetaKeys.video_optical_flow, *args, **kwargs):
Import
from data_juicer.ops.filter.video_motion_score_filter import VideoMotionScoreFilter
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| min_score | float | No | Minimum motion score (default: 0.25) |
| max_score | float | No | Maximum motion score (default: sys.float_info.max) |
| sampling_fps | PositiveFloat | No | Sampling rate in frames per second (default: 2) |
| size | Union[int, tuple, None] | No | Resize frames before computing flow (default: None) |
| relative | bool | No | Normalize flow relative to frame diagonal (default: False) |
| any_or_all | str | No | Keep strategy: "any" or "all" (default: "any") |
| if_output_optical_flow | bool | No | Output computed optical flows to metas (default: False) |
Outputs
| Name | Type | Description |
|---|---|---|
| samples | Dict | Filtered samples with video_motion_score stat computed |
Usage Examples
YAML Configuration
process:
- video_motion_score_filter:
min_score: 0.25
sampling_fps: 2
Python API
from data_juicer.ops.filter.video_motion_score_filter import VideoMotionScoreFilter
op = VideoMotionScoreFilter(min_score=0.25, sampling_fps=2)