Implementation:Open compass VLMEvalKit Get Score Gen Table

Field	Value
Source	VLMEvalKit
Domain	Vision, Evaluation, Data_Processing

Overview

Concrete tool for aggregating evaluation scores across model-benchmark pairs into summary tables provided by VLMEvalKit.

Description

get_score() and gen_table() in scripts/summarize.py handle result aggregation. get_score(model, dataset) reads the appropriate result file (format depends on benchmark type: _acc.csv for MCQ, _score.csv for scoring-based, _score.json for JSON-based) and extracts the overall metric. gen_table(models, datasets) iterates over all combinations, builds a DataFrame with models as rows and benchmarks as columns, and outputs a formatted table using tabulate.

Usage

Run as a CLI script or import programmatically.

Code Reference

Source: scripts/summarize.py, Lines: L4-71 (get_score), L80-114 (gen_table)
Signature:

def get_score(model: str, dataset: str) -> dict:
    """
    Reads result file for a model/dataset pair and returns metric dict.
    """

def gen_table(models: List[str], datasets: List[str]) -> pd.DataFrame:
    """
    Aggregates scores across all model/dataset pairs into a DataFrame.
    Saves to 'summ.csv' and prints formatted table.
    """

Import: (script) python scripts/summarize.py --model M1 M2 --data D1 D2

I/O Contract

Direction	Name	Type	Description
Input	model	str	Model name (e.g., "InternVL2-8B")
Input	dataset	str	Dataset name (e.g., "MMBench_DEV_EN_V11")
Input	(file system)	files	Reads from {model}/{model}_{dataset}_acc.csv, _score.csv, or _score.json
Output	get_score result	dict	metric_name to score mapping
Output	gen_table result	DataFrame	Models as rows, benchmarks as columns

Usage Examples

# CLI usage
# python scripts/summarize.py --model InternVL2-8B GPT4o --data MMBench_DEV_EN_V11 AI2D_TEST

# Programmatic usage
from scripts.summarize import get_score, gen_table
scores = get_score("InternVL2-8B", "MMBench_DEV_EN_V11")
print(scores)  # {'MMBench_DEV_EN': 75.2}

table = gen_table(["InternVL2-8B", "GPT4o"], ["MMBench_DEV_EN_V11", "AI2D_TEST"])
print(table)

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment