Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Datajuicer Data juicer ImageAestheticsFilter

From Leeroopedia
Knowledge Sources
Domains Data_Quality, Filtering
Last Updated 2026-02-14 16:00 GMT

Overview

Concrete tool for filtering data samples based on image aesthetics scores provided by Data-Juicer.

Description

ImageAestheticsFilter is a filter operator that keeps samples with aesthetics scores within a specific range. It uses a HuggingFace model (default: shunk031/aesthetics-predictor-v2-sac-logos-ava1-l14-linearMSE) to predict image aesthetics. Scores are normalized by dividing by 10 if the model name includes 'shunk031/aesthetics-predictor'. The operator supports CUDA acceleration and 'any'/'all' strategies across multiple images per sample. The key metric image_aesthetics_scores is cached in the stats field. It extends the Filter base class and implements the two-phase compute_stats/process pattern.

Usage

Import this operator when you need to filter dataset samples based on the aesthetic quality of images. Configure it in your Data-Juicer YAML config or instantiate directly.

Code Reference

Source Location

Signature

@OPERATORS.register_module("image_aesthetics_filter")
@LOADED_IMAGES.register_module("image_aesthetics_filter")
class ImageAestheticsFilter(Filter):
    def __init__(
        self,
        hf_scorer_model: str = "",
        trust_remote_code: bool = False,
        min_score: float = 0.5,
        max_score: float = 1.0,
        any_or_all: str = "any",
        *args,
        **kwargs,
    ):
        ...

Import

from data_juicer.ops.filter.image_aesthetics_filter import ImageAestheticsFilter

I/O Contract

Inputs

Name Type Required Description
hf_scorer_model str No HuggingFace model name for aesthetics prediction. Default: "shunk031/aesthetics-predictor-v2-sac-logos-ava1-l14-linearMSE"
trust_remote_code bool No Whether to trust remote code of HF models. Default: False
min_score float No Minimum aesthetics score to keep samples. Default: 0.5
max_score float No Maximum aesthetics score to keep samples. Default: 1.0
any_or_all str No Keep strategy: 'any' or 'all' across images. Default: "any"

Outputs

Name Type Description
samples Dict Filtered samples with stats field updated (image_aesthetics_scores)

Usage Examples

YAML Configuration

process:
  - image_aesthetics_filter:
      min_score: 0.5
      max_score: 1.0
      any_or_all: "any"

Python API

from data_juicer.ops.filter.image_aesthetics_filter import ImageAestheticsFilter

op = ImageAestheticsFilter(min_score=0.5, max_score=1.0)
# Apply to dataset
result = dataset.process(op)

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment