Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:Open compass VLMEvalKit ImageMCQDataset Evaluate

From Leeroopedia
Field Value
Source VLMEvalKit
Domain Vision, Evaluation, NLP

Overview

Concrete tool for evaluating VLM predictions on multiple-choice benchmarks using heuristic and LLM-based answer extraction provided by VLMEvalKit.

Description

ImageMCQDataset.evaluate() in vlmeval/dataset/image_mcq.py dispatches to either evaluate_heuristic() (default) or evaluate_verifier() (when use_verifier=True). The heuristic path uses extract_answer_from_item() from vlmeval/dataset/utils/multiple_choice.py which applies regex-based answer extraction with LLM fallback via build_judge(). Results are computed as accuracy per split/category and saved to _acc.csv.

Usage

Called after inference completes. Requires the prediction file from the inference step. Optionally requires a judge LLM API key for LLM-based extraction fallback.

Code Reference

  • Source: vlmeval/dataset/image_mcq.py, Lines: L236-240 (evaluate entry), L42-465 (full class)
  • Also: vlmeval/dataset/utils/multiple_choice.py, Lines: L350-499 (answer extraction)
  • Signature:
def evaluate(self, eval_file: str, **judge_kwargs) -> Union[pd.DataFrame, dict]:
    """
    Args:
        eval_file: Path to predictions file (xlsx/csv/tsv).
        **judge_kwargs: Keyword arguments including:
            - model (str): Judge model name (e.g., "chatgpt-0125")
            - nproc (int): Parallel judge calls
            - use_verifier (bool): Use verifier mode instead of heuristic
    Returns:
        DataFrame with accuracy by split/category, or dict with scores.
    """
  • Import: (method on ImageMCQDataset class) from vlmeval.dataset import ImageMCQDataset

I/O Contract

Direction Name Type Description
Input eval_file str Path to prediction file with columns: index, prediction, answer, A, B, C, D
Input judge_kwargs dict LLM judge config (model name, nproc, use_verifier)
Output results DataFrame Accuracy per split/category
Side Effect _acc.csv file Saves accuracy results to disk

Usage Examples

from vlmeval.dataset import build_dataset

dataset = build_dataset("MMBench_DEV_EN_V11")
# After inference produces the prediction file:
results = dataset.evaluate(
    eval_file="./results/InternVL2-8B_MMBench_DEV_EN_V11.xlsx",
    model="chatgpt-0125",
    nproc=4
)
print(results)  # DataFrame with accuracy by split/category

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment