Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Open compass VLMEvalKit VGRPBench Score

From Leeroopedia
Revision as of 13:32, 16 February 2026 by Admin (talk | contribs) (Auto-imported from implementations/Open_compass_VLMEvalKit_VGRPBench_Score.md)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Field Value
source VLMEvalKit
domain Vision, Evaluation, Grid Reasoning, Puzzle Solving

Overview

Implements the scoring module for VGRPBench, evaluating visual grid reasoning puzzle solutions including perception accuracy and solution verification.

Description

This module provides extract_perception_and_answer for parsing model outputs into perceived initial states and solutions, handling both "Initial State/Answer" and "Perception/Solution" section header formats. It uses json_repair for robust JSON parsing of model outputs. The scoring system evaluates two dimensions: perception accuracy (how well the model reads the puzzle grid) and solution correctness (whether the solution is valid for the perceived or actual initial state). It delegates puzzle verification to game-specific factories via get_game_factory from the puzzles submodule, supporting configurable grid sizes through the GRID_SIZE global variable.

Usage

Called internally by the VGRPBench dataset class during puzzle evaluation.

Code Reference

  • Source: vlmeval/dataset/utils/vgrpbench/score.py, Lines: L1-438
  • Import: from vlmeval.dataset.utils.vgrpbench.score import extract_perception_and_answer

Key Functions:

GRID_SIZE = None  # Global variable for puzzle grid size

def extract_perception_and_answer(model_output): ...
def evaluate_perception(perceived_state, actual_state): ...
def verify_solution(solution, initial_state, puzzle_type): ...

I/O Contract

Direction Description
Inputs Raw model output string containing "Initial State" and "Answer" sections; actual puzzle state; puzzle type identifier
Outputs Tuple of (initial_state, solution) as 2D arrays; perception accuracy score; boolean solution correctness

Usage Examples

# Internal usage example
from vlmeval.dataset.utils.vgrpbench.score import extract_perception_and_answer
initial_state, solution = extract_perception_and_answer(model_output)

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment