Implementation:Open compass VLMEvalKit Get Score Gen Table
| Field | Value |
|---|---|
| Source | VLMEvalKit |
| Domain | Vision, Evaluation, Data_Processing |
Overview
Concrete tool for aggregating evaluation scores across model-benchmark pairs into summary tables provided by VLMEvalKit.
Description
get_score() and gen_table() in scripts/summarize.py handle result aggregation. get_score(model, dataset) reads the appropriate result file (format depends on benchmark type: _acc.csv for MCQ, _score.csv for scoring-based, _score.json for JSON-based) and extracts the overall metric. gen_table(models, datasets) iterates over all combinations, builds a DataFrame with models as rows and benchmarks as columns, and outputs a formatted table using tabulate.
Usage
Run as a CLI script or import programmatically.
Code Reference
- Source:
scripts/summarize.py, Lines: L4-71 (get_score), L80-114 (gen_table) - Signature:
def get_score(model: str, dataset: str) -> dict:
"""
Reads result file for a model/dataset pair and returns metric dict.
"""
def gen_table(models: List[str], datasets: List[str]) -> pd.DataFrame:
"""
Aggregates scores across all model/dataset pairs into a DataFrame.
Saves to 'summ.csv' and prints formatted table.
"""
- Import: (script)
python scripts/summarize.py --model M1 M2 --data D1 D2
I/O Contract
| Direction | Name | Type | Description |
|---|---|---|---|
| Input | model | str | Model name (e.g., "InternVL2-8B") |
| Input | dataset | str | Dataset name (e.g., "MMBench_DEV_EN_V11") |
| Input | (file system) | files | Reads from {model}/{model}_{dataset}_acc.csv, _score.csv, or _score.json |
| Output | get_score result | dict | metric_name to score mapping |
| Output | gen_table result | DataFrame | Models as rows, benchmarks as columns |
Usage Examples
# CLI usage
# python scripts/summarize.py --model InternVL2-8B GPT4o --data MMBench_DEV_EN_V11 AI2D_TEST
# Programmatic usage
from scripts.summarize import get_score, gen_table
scores = get_score("InternVL2-8B", "MMBench_DEV_EN_V11")
print(scores) # {'MMBench_DEV_EN': 75.2}
table = gen_table(["InternVL2-8B", "GPT4o"], ["MMBench_DEV_EN_V11", "AI2D_TEST"])
print(table)