Implementation:Datajuicer Data juicer VideoAspectRatioFilter
| Knowledge Sources | |
|---|---|
| Domains | Data_Quality, Filtering |
| Last Updated | 2026-02-14 16:00 GMT |
Overview
Concrete tool for filtering data samples based on video aspect ratio provided by Data-Juicer.
Description
VideoAspectRatioFilter is a filter operator that keeps samples where the video aspect ratio (width/height) falls within a specified range. It extends Filter and uses the two-phase compute_stats/process pattern. It loads video files, reads the video stream's codec context to obtain width and height, and computes the aspect ratio. Ratios are specified as fraction strings (e.g., "9:21" or "9/21") parsed using Python's Fraction class. Results are cached under video_aspect_ratios. Supports 'any'/'all' strategy across multiple videos per sample.
Usage
Import when filtering based on video aspect ratio. Configure in YAML or Python.
Code Reference
Source Location
- Repository: Datajuicer_Data_juicer
- File: data_juicer/ops/filter/video_aspect_ratio_filter.py
Signature
@OPERATORS.register_module("video_aspect_ratio_filter")
class VideoAspectRatioFilter(Filter):
def __init__(self, min_ratio: str = "9/21", max_ratio: str = "21/9", any_or_all: str = "any", *args, **kwargs):
Import
from data_juicer.ops.filter.video_aspect_ratio_filter import VideoAspectRatioFilter
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| min_ratio | str | No | Minimum aspect ratio as fraction string (default: "9/21") |
| max_ratio | str | No | Maximum aspect ratio as fraction string (default: "21/9") |
| any_or_all | str | No | Keep strategy: "any" or "all" (default: "any") |
Outputs
| Name | Type | Description |
|---|---|---|
| samples | Dict | Filtered samples with video_aspect_ratios stat computed |
Usage Examples
YAML Configuration
process:
- video_aspect_ratio_filter:
min_ratio: "9/21"
max_ratio: "21/9"
Python API
from data_juicer.ops.filter.video_aspect_ratio_filter import VideoAspectRatioFilter
op = VideoAspectRatioFilter(min_ratio="9/16", max_ratio="16/9")