Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:Open compass VLMEvalKit Get Score Gen Table

From Leeroopedia
Field Value
Source VLMEvalKit
Domain Vision, Evaluation, Data_Processing

Overview

Concrete tool for aggregating evaluation scores across model-benchmark pairs into summary tables provided by VLMEvalKit.

Description

get_score() and gen_table() in scripts/summarize.py handle result aggregation. get_score(model, dataset) reads the appropriate result file (format depends on benchmark type: _acc.csv for MCQ, _score.csv for scoring-based, _score.json for JSON-based) and extracts the overall metric. gen_table(models, datasets) iterates over all combinations, builds a DataFrame with models as rows and benchmarks as columns, and outputs a formatted table using tabulate.

Usage

Run as a CLI script or import programmatically.

Code Reference

  • Source: scripts/summarize.py, Lines: L4-71 (get_score), L80-114 (gen_table)
  • Signature:
def get_score(model: str, dataset: str) -> dict:
    """
    Reads result file for a model/dataset pair and returns metric dict.
    """

def gen_table(models: List[str], datasets: List[str]) -> pd.DataFrame:
    """
    Aggregates scores across all model/dataset pairs into a DataFrame.
    Saves to 'summ.csv' and prints formatted table.
    """
  • Import: (script) python scripts/summarize.py --model M1 M2 --data D1 D2

I/O Contract

Direction Name Type Description
Input model str Model name (e.g., "InternVL2-8B")
Input dataset str Dataset name (e.g., "MMBench_DEV_EN_V11")
Input (file system) files Reads from {model}/{model}_{dataset}_acc.csv, _score.csv, or _score.json
Output get_score result dict metric_name to score mapping
Output gen_table result DataFrame Models as rows, benchmarks as columns

Usage Examples

# CLI usage
# python scripts/summarize.py --model InternVL2-8B GPT4o --data MMBench_DEV_EN_V11 AI2D_TEST

# Programmatic usage
from scripts.summarize import get_score, gen_table
scores = get_score("InternVL2-8B", "MMBench_DEV_EN_V11")
print(scores)  # {'MMBench_DEV_EN': 75.2}

table = gen_table(["InternVL2-8B", "GPT4o"], ["MMBench_DEV_EN_V11", "AI2D_TEST"])
print(table)

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment