Implementation:Open compass VLMEvalKit SArena Metrics
| Field | Value |
|---|---|
| source | VLMEvalKit |
| domain | Vision, Evaluation, Image Generation, Metrics Orchestration |
Overview
Orchestrates multiple image generation evaluation metrics for the SArena (InternSVG) benchmark through a configurable metrics registry.
Description
The `InternSVGMetrics` class uses a `MetricsConfig` dataclass to selectively instantiate and manage evaluation metrics including FID, FID-C, CLIP Score (T2I/I2I), DINO Score, LPIPS, SSIM, PSNR, and Token Length. The `calculate_metrics` method iterates over active metrics, computing scores for each and aggregating results. The registry pattern uses lazy initialization via lambda builders for memory-efficient metric loading.
Usage
Called internally by the corresponding dataset class during evaluation.
Code Reference
- Source:
vlmeval/dataset/utils/SArena/metrics.py, Lines: L1-82 - Import:
from vlmeval.dataset.utils.SArena.metrics import InternSVGMetrics, MetricsConfig
Key Functions:
@dataclass
class MetricsConfig: ...
class InternSVGMetrics:
def calculate_metrics(self, batch): ...
def reset(self): ...
I/O Contract
| Direction | Description |
|---|---|
| Inputs | A batch dict containing image data ('pred_im', 'gt_im', 'caption' as applicable) and a MetricsConfig specifying which metrics to use |
| Outputs | Dictionary mapping metric names to their computed average scores |
Usage Examples
from vlmeval.dataset.utils.SArena.metrics import InternSVGMetrics, MetricsConfig
config = MetricsConfig(use_FID=True, use_CLIP_Score_T2I=True)
metrics = InternSVGMetrics(config, tokenizer_path="path/to/tokenizer")
results = metrics.calculate_metrics(batch)