Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Open compass VLMEvalKit CGBench Utils

From Leeroopedia
Field Value
source VLMEvalKit
domain Vision, Evaluation, Video Understanding, Clue-Grounded

Overview

Provides evaluation utilities for the CGBench (Clue-Grounded Video Understanding) benchmark, including open-ended answer evaluation with LLM-as-judge.

Description

This module implements evaluation functions for CGBench's video understanding tasks across multiple domains (Life Record, Music/TV, Driving, etc.) and duration categories. It uses a two-step LLM-based open evaluation approach: first comparing model predictions against ground-truth answers textually, then optionally using visual information from clue intervals for ambiguous cases. Key components include extract_answer_from_item for multiple-choice extraction and system prompts for the LLM judge (sys_prompt_open_eval_step_1, sys_prompt_open_eval_step_2).

Usage

Called internally by the CGBench dataset class during evaluation.

Code Reference

  • Source: vlmeval/dataset/utils/cgbench.py, Lines: L1-620
  • Import: from vlmeval.dataset.utils.cgbench import get_dimension_rating

Key Functions:

def get_dimension_rating(data_path): ...
def check_ans(pred, gt): ...
def evaluate_open_ended(question, response, ground_truth, model): ...

I/O Contract

Direction Description
Inputs Model predictions, ground-truth answers, question text, and optionally video frame paths for visual grounding
Outputs Scores (0 or 1) per question; aggregated accuracy by domain, duration, and task type as dictionaries

Usage Examples

# Internal usage example
from vlmeval.dataset.utils.cgbench import get_dimension_rating
results = get_dimension_rating(data_path)

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment