Implementation:Datajuicer Data juicer ImageAspectRatioFilter
| Knowledge Sources | |
|---|---|
| Domains | Data_Quality, Filtering |
| Last Updated | 2026-02-14 16:00 GMT |
Overview
Concrete tool for filtering data samples based on image aspect ratio provided by Data-Juicer.
Description
ImageAspectRatioFilter is a filter operator that keeps samples with image aspect ratio within a specific range. The aspect ratio is computed as width divided by height (W/H). The computed aspect ratios are cached under the aspect_ratios stats key. The operator supports 'any' (keep if any image meets the criteria) and 'all' (keep only if all images meet the criteria) strategies. It extends the Filter base class and implements the two-phase compute_stats/process pattern.
Usage
Import this operator when you need to filter dataset samples based on the aspect ratio of images. Configure it in your Data-Juicer YAML config or instantiate directly.
Code Reference
Source Location
- Repository: Datajuicer_Data_juicer
- File: data_juicer/ops/filter/image_aspect_ratio_filter.py
- Lines: 1-87
Signature
@OPERATORS.register_module("image_aspect_ratio_filter")
@LOADED_IMAGES.register_module("image_aspect_ratio_filter")
class ImageAspectRatioFilter(Filter):
def __init__(self, min_ratio: float = 0.333, max_ratio: float = 3.0, any_or_all: str = "any", *args, **kwargs):
...
Import
from data_juicer.ops.filter.image_aspect_ratio_filter import ImageAspectRatioFilter
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| min_ratio | float | No | The minimum aspect ratio (W/H) to keep samples. Default: 0.333 |
| max_ratio | float | No | The maximum aspect ratio (W/H) to keep samples. Default: 3.0 |
| any_or_all | str | No | Keep strategy: 'any' or 'all' across images. Default: "any" |
Outputs
| Name | Type | Description |
|---|---|---|
| samples | Dict | Filtered samples with stats field updated (aspect_ratios) |
Usage Examples
YAML Configuration
process:
- image_aspect_ratio_filter:
min_ratio: 0.333
max_ratio: 3.0
any_or_all: "any"
Python API
from data_juicer.ops.filter.image_aspect_ratio_filter import ImageAspectRatioFilter
op = ImageAspectRatioFilter(min_ratio=0.333, max_ratio=3.0)
# Apply to dataset
result = dataset.process(op)