Implementation:Open compass VLMEvalKit MVBench Utils

Field	Value
source	VLMEvalKit
domain	Vision, Evaluation, Video Understanding, Multiple Choice

Overview

Provides evaluation utilities for the MVBench video understanding benchmark, including dimension-based scoring and answer checking with optional LLM-as-judge.

Description

This module implements get_dimension_rating for computing per-task-type accuracy breakdowns from scored data files, and check_ans for comparing predicted multiple-choice answers against ground-truth options. The check_ans function performs flexible option matching by extracting the first word of predictions and ground-truth, handling period removal and case-insensitive comparison. The check_ans_with_model function extends this with LLM-based answer verification when simple string matching is insufficient. Results are aggregated by task type with percentage formatting.

Usage

Called internally by the MVBench dataset class during video understanding evaluation.

Code Reference

Source: vlmeval/dataset/utils/mvbench.py, Lines: L1-509
Import: from vlmeval.dataset.utils.mvbench import get_dimension_rating, check_ans

Key Functions:

def get_dimension_rating(data_path): ...
def check_ans(pred, gt): ...
def check_ans_with_model(pred, gt, model, item, dataset_name='MVBench'): ...

I/O Contract

Direction	Description
Inputs	Scored data file path for dimension rating; predicted and ground-truth answer strings for answer checking
Outputs	Dictionary mapping task types to [correct, total, percentage] lists; boolean correctness for individual answers

Usage Examples

# Internal usage example
from vlmeval.dataset.utils.mvbench import get_dimension_rating, check_ans
results = get_dimension_rating("scores.xlsx")
is_correct = check_ans("A. cat", "A cat")

Related Pages

Principle:Open_compass_VLMEvalKit_Benchmark_Dataset_Construction

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment