Implementation:Open compass VLMEvalKit CGBench Utils

Field	Value
source	VLMEvalKit
domain	Vision, Evaluation, Video Understanding, Clue-Grounded

Overview

Provides evaluation utilities for the CGBench (Clue-Grounded Video Understanding) benchmark, including open-ended answer evaluation with LLM-as-judge.

Description

This module implements evaluation functions for CGBench's video understanding tasks across multiple domains (Life Record, Music/TV, Driving, etc.) and duration categories. It uses a two-step LLM-based open evaluation approach: first comparing model predictions against ground-truth answers textually, then optionally using visual information from clue intervals for ambiguous cases. Key components include extract_answer_from_item for multiple-choice extraction and system prompts for the LLM judge (sys_prompt_open_eval_step_1, sys_prompt_open_eval_step_2).

Usage

Called internally by the CGBench dataset class during evaluation.

Code Reference

Source: vlmeval/dataset/utils/cgbench.py, Lines: L1-620
Import: from vlmeval.dataset.utils.cgbench import get_dimension_rating

Key Functions:

def get_dimension_rating(data_path): ...
def check_ans(pred, gt): ...
def evaluate_open_ended(question, response, ground_truth, model): ...

I/O Contract

Direction	Description
Inputs	Model predictions, ground-truth answers, question text, and optionally video frame paths for visual grounding
Outputs	Scores (0 or 1) per question; aggregated accuracy by domain, duration, and task type as dictionaries

Usage Examples

# Internal usage example
from vlmeval.dataset.utils.cgbench import get_dimension_rating
results = get_dimension_rating(data_path)

Related Pages

Principle:Open_compass_VLMEvalKit_Benchmark_Dataset_Construction

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment