Principle:Open compass VLMEvalKit Results Summarization

Field	Value
Source	Repo
Domain	Vision, Evaluation, Data_Processing

Overview

An aggregation pattern that collects evaluation scores across multiple model-benchmark pairs into a unified comparison table.

Description

After running evaluation on multiple model x dataset combinations, VLMEvalKit provides utilities to aggregate results into summary tables. The get_score() function reads per-benchmark result files (_acc.csv, _score.csv, _score.json) and extracts the relevant metric for each benchmark. The gen_table() function iterates over all model x dataset pairs, collects scores, and produces a formatted DataFrame for cross-model comparison. This enables systematic benchmarking of VLMs across dozens of benchmarks.

Usage

Use after completing all evaluations. Run python scripts/summarize.py --model model1 model2 --data dataset1 dataset2 or call get_score() / gen_table() programmatically.

Theoretical Basis

Result aggregation — collect per-benchmark metrics into a cross-model comparison matrix. Each benchmark has its own metric type (accuracy, score, F1, etc.) and the summarizer handles the format differences transparently. The output matrix has:

Rows: Models under evaluation
Columns: Benchmark datasets
Cells: The primary metric for each model-benchmark pair

Related Pages

Implementation:Open_compass_VLMEvalKit_Get_Score_Gen_Table

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment