Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Datajuicer Data juicer ImageAspectRatioFilter

From Leeroopedia
Revision as of 12:21, 16 February 2026 by Admin (talk | contribs) (Auto-imported from implementations/Datajuicer_Data_juicer_ImageAspectRatioFilter.md)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Knowledge Sources
Domains Data_Quality, Filtering
Last Updated 2026-02-14 16:00 GMT

Overview

Concrete tool for filtering data samples based on image aspect ratio provided by Data-Juicer.

Description

ImageAspectRatioFilter is a filter operator that keeps samples with image aspect ratio within a specific range. The aspect ratio is computed as width divided by height (W/H). The computed aspect ratios are cached under the aspect_ratios stats key. The operator supports 'any' (keep if any image meets the criteria) and 'all' (keep only if all images meet the criteria) strategies. It extends the Filter base class and implements the two-phase compute_stats/process pattern.

Usage

Import this operator when you need to filter dataset samples based on the aspect ratio of images. Configure it in your Data-Juicer YAML config or instantiate directly.

Code Reference

Source Location

Signature

@OPERATORS.register_module("image_aspect_ratio_filter")
@LOADED_IMAGES.register_module("image_aspect_ratio_filter")
class ImageAspectRatioFilter(Filter):
    def __init__(self, min_ratio: float = 0.333, max_ratio: float = 3.0, any_or_all: str = "any", *args, **kwargs):
        ...

Import

from data_juicer.ops.filter.image_aspect_ratio_filter import ImageAspectRatioFilter

I/O Contract

Inputs

Name Type Required Description
min_ratio float No The minimum aspect ratio (W/H) to keep samples. Default: 0.333
max_ratio float No The maximum aspect ratio (W/H) to keep samples. Default: 3.0
any_or_all str No Keep strategy: 'any' or 'all' across images. Default: "any"

Outputs

Name Type Description
samples Dict Filtered samples with stats field updated (aspect_ratios)

Usage Examples

YAML Configuration

process:
  - image_aspect_ratio_filter:
      min_ratio: 0.333
      max_ratio: 3.0
      any_or_all: "any"

Python API

from data_juicer.ops.filter.image_aspect_ratio_filter import ImageAspectRatioFilter

op = ImageAspectRatioFilter(min_ratio=0.333, max_ratio=3.0)
# Apply to dataset
result = dataset.process(op)

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment