Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Open compass VLMEvalKit MVBench Utils

From Leeroopedia
Field Value
source VLMEvalKit
domain Vision, Evaluation, Video Understanding, Multiple Choice

Overview

Provides evaluation utilities for the MVBench video understanding benchmark, including dimension-based scoring and answer checking with optional LLM-as-judge.

Description

This module implements get_dimension_rating for computing per-task-type accuracy breakdowns from scored data files, and check_ans for comparing predicted multiple-choice answers against ground-truth options. The check_ans function performs flexible option matching by extracting the first word of predictions and ground-truth, handling period removal and case-insensitive comparison. The check_ans_with_model function extends this with LLM-based answer verification when simple string matching is insufficient. Results are aggregated by task type with percentage formatting.

Usage

Called internally by the MVBench dataset class during video understanding evaluation.

Code Reference

  • Source: vlmeval/dataset/utils/mvbench.py, Lines: L1-509
  • Import: from vlmeval.dataset.utils.mvbench import get_dimension_rating, check_ans

Key Functions:

def get_dimension_rating(data_path): ...
def check_ans(pred, gt): ...
def check_ans_with_model(pred, gt, model, item, dataset_name='MVBench'): ...

I/O Contract

Direction Description
Inputs Scored data file path for dimension rating; predicted and ground-truth answer strings for answer checking
Outputs Dictionary mapping task types to [correct, total, percentage] lists; boolean correctness for individual answers

Usage Examples

# Internal usage example
from vlmeval.dataset.utils.mvbench import get_dimension_rating, check_ans
results = get_dimension_rating("scores.xlsx")
is_correct = check_ans("A. cat", "A cat")

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment