Implementation:NVIDIA NeMo Curator AestheticScorer

Knowledge Sources	NVIDIA NeMo Curator
Domains	Machine Learning, Computer Vision, Content Scoring
Last Updated	2026-02-14 00:00 GMT

Overview

Provides an aesthetic quality scorer that predicts visual quality scores from CLIP embeddings using a pre-trained MLP model.

Description

The aesthetics module contains two classes:

MLP is a 5-layer feedforward neural network (768 -> 1024 -> 128 -> 64 -> 16 -> 1) with dropout layers (0.2, 0.2, 0.1) between the first three linear layers. It takes 768-dimensional CLIP embedding vectors as input and outputs a single aesthetic score per sample. The forward pass runs under torch.no_grad() for inference efficiency.

AestheticScorer implements ModelInterface and provides the public interface for aesthetic scoring. It loads pre-trained weights from HuggingFace (ttj/sac-logos-ava1-l14-linearMSE, revision 1e77fa0) in safetensors format. The model automatically selects CUDA if available, falling back to CPU. On __call__, it accepts embeddings as either a torch.Tensor or numpy.ndarray, converts numpy arrays to tensors, moves them to the appropriate device, and returns per-sample aesthetic scores.

Usage

Use AestheticScorer when you need to evaluate the visual quality of images or video frames in a curation pipeline. It is typically chained with CLIP embedding extraction (e.g., via CLIPAestheticScorer in clip.py) to enable automated filtering of content by visual quality.

Code Reference

Source Location

Repository: NeMo-Curator
File: nemo_curator/models/aesthetics.py
Lines: 1-139

Signature

class MLP(nn.Module):
    def __init__(self) -> None: ...
    def forward(self, embed: torch.Tensor) -> torch.Tensor: ...

class AestheticScorer(ModelInterface):
    def __init__(self, model_dir: str) -> None: ...
    @property
    def model_id_names(self) -> list[str]: ...
    def setup(self) -> None: ...
    def get_weights_path(self) -> str: ...
    def __call__(self, embeddings: torch.Tensor | npt.NDArray[np.float32]) -> torch.Tensor: ...
    @classmethod
    def download_weights_on_node(cls, model_dir: str) -> None: ...

Import

from nemo_curator.models.aesthetics import AestheticScorer

I/O Contract

Inputs (Constructor)

Name	Type	Required	Description
model_dir	str	Yes	Path to the directory where model weights are stored or will be downloaded

Inputs (call)

Name	Type	Required	Description
embeddings	torch.Tensor or numpy.ndarray	Yes	CLIP embeddings with shape (batch_size, 768) as a torch tensor or numpy array

Outputs

Name	Type	Description
scores	torch.Tensor	Per-sample aesthetic scores with shape (batch_size,)

Model Architecture

Layer	Configuration
Linear	768 -> 1024
Dropout	p=0.2
Linear	1024 -> 128
Dropout	p=0.2
Linear	128 -> 64
Dropout	p=0.1
Linear	64 -> 16
Linear	16 -> 1

Pre-trained model: ttj/sac-logos-ava1-l14-linearMSE (HuggingFace, safetensors format)

Usage Examples

Basic Usage

from nemo_curator.models.aesthetics import AestheticScorer
import torch

# Download weights first
AestheticScorer.download_weights_on_node("/path/to/models")

# Initialize and setup
scorer = AestheticScorer(model_dir="/path/to/models")
scorer.setup()

# Score CLIP embeddings
embeddings = torch.randn(10, 768)  # batch of 10 CLIP embeddings
scores = scorer(embeddings)
print(scores.shape)  # torch.Size([10])

Usage with NumPy Arrays

import numpy as np
from nemo_curator.models.aesthetics import AestheticScorer

scorer = AestheticScorer(model_dir="/path/to/models")
scorer.setup()

# Also accepts numpy arrays
embeddings_np = np.random.randn(5, 768).astype(np.float32)
scores = scorer(embeddings_np)

Related Pages

Environment:NVIDIA_NeMo_Curator_Python_Linux_Base
NVIDIA_NeMo_Curator_ModelInterface -- Base class that AestheticScorer implements
NVIDIA_NeMo_Curator_NSFWScorer -- Similar scoring model for NSFW content detection

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment