Implementation:NVIDIA NeMo Curator AestheticScorer
| Knowledge Sources | |
|---|---|
| Domains | Machine Learning, Computer Vision, Content Scoring |
| Last Updated | 2026-02-14 00:00 GMT |
Overview
Provides an aesthetic quality scorer that predicts visual quality scores from CLIP embeddings using a pre-trained MLP model.
Description
The aesthetics module contains two classes:
MLP is a 5-layer feedforward neural network (768 -> 1024 -> 128 -> 64 -> 16 -> 1) with dropout layers (0.2, 0.2, 0.1) between the first three linear layers. It takes 768-dimensional CLIP embedding vectors as input and outputs a single aesthetic score per sample. The forward pass runs under torch.no_grad() for inference efficiency.
AestheticScorer implements ModelInterface and provides the public interface for aesthetic scoring. It loads pre-trained weights from HuggingFace (ttj/sac-logos-ava1-l14-linearMSE, revision 1e77fa0) in safetensors format. The model automatically selects CUDA if available, falling back to CPU. On __call__, it accepts embeddings as either a torch.Tensor or numpy.ndarray, converts numpy arrays to tensors, moves them to the appropriate device, and returns per-sample aesthetic scores.
Usage
Use AestheticScorer when you need to evaluate the visual quality of images or video frames in a curation pipeline. It is typically chained with CLIP embedding extraction (e.g., via CLIPAestheticScorer in clip.py) to enable automated filtering of content by visual quality.
Code Reference
Source Location
- Repository: NeMo-Curator
- File: nemo_curator/models/aesthetics.py
- Lines: 1-139
Signature
class MLP(nn.Module):
def __init__(self) -> None: ...
def forward(self, embed: torch.Tensor) -> torch.Tensor: ...
class AestheticScorer(ModelInterface):
def __init__(self, model_dir: str) -> None: ...
@property
def model_id_names(self) -> list[str]: ...
def setup(self) -> None: ...
def get_weights_path(self) -> str: ...
def __call__(self, embeddings: torch.Tensor | npt.NDArray[np.float32]) -> torch.Tensor: ...
@classmethod
def download_weights_on_node(cls, model_dir: str) -> None: ...
Import
from nemo_curator.models.aesthetics import AestheticScorer
I/O Contract
Inputs (Constructor)
| Name | Type | Required | Description |
|---|---|---|---|
| model_dir | str | Yes | Path to the directory where model weights are stored or will be downloaded |
Inputs (__call__)
| Name | Type | Required | Description |
|---|---|---|---|
| embeddings | torch.Tensor or numpy.ndarray | Yes | CLIP embeddings with shape (batch_size, 768) as a torch tensor or numpy array |
Outputs
| Name | Type | Description |
|---|---|---|
| scores | torch.Tensor | Per-sample aesthetic scores with shape (batch_size,) |
Model Architecture
| Layer | Configuration |
|---|---|
| Linear | 768 -> 1024 |
| Dropout | p=0.2 |
| Linear | 1024 -> 128 |
| Dropout | p=0.2 |
| Linear | 128 -> 64 |
| Dropout | p=0.1 |
| Linear | 64 -> 16 |
| Linear | 16 -> 1 |
Pre-trained model: ttj/sac-logos-ava1-l14-linearMSE (HuggingFace, safetensors format)
Usage Examples
Basic Usage
from nemo_curator.models.aesthetics import AestheticScorer
import torch
# Download weights first
AestheticScorer.download_weights_on_node("/path/to/models")
# Initialize and setup
scorer = AestheticScorer(model_dir="/path/to/models")
scorer.setup()
# Score CLIP embeddings
embeddings = torch.randn(10, 768) # batch of 10 CLIP embeddings
scores = scorer(embeddings)
print(scores.shape) # torch.Size([10])
Usage with NumPy Arrays
import numpy as np
from nemo_curator.models.aesthetics import AestheticScorer
scorer = AestheticScorer(model_dir="/path/to/models")
scorer.setup()
# Also accepts numpy arrays
embeddings_np = np.random.randn(5, 768).astype(np.float32)
scores = scorer(embeddings_np)
Related Pages
- Environment:NVIDIA_NeMo_Curator_Python_Linux_Base
- NVIDIA_NeMo_Curator_ModelInterface -- Base class that AestheticScorer implements
- NVIDIA_NeMo_Curator_NSFWScorer -- Similar scoring model for NSFW content detection