Implementation:Open compass VLMEvalKit VGRPBench Score

Field	Value
source	VLMEvalKit
domain	Vision, Evaluation, Grid Reasoning, Puzzle Solving

Overview

Implements the scoring module for VGRPBench, evaluating visual grid reasoning puzzle solutions including perception accuracy and solution verification.

Description

This module provides extract_perception_and_answer for parsing model outputs into perceived initial states and solutions, handling both "Initial State/Answer" and "Perception/Solution" section header formats. It uses json_repair for robust JSON parsing of model outputs. The scoring system evaluates two dimensions: perception accuracy (how well the model reads the puzzle grid) and solution correctness (whether the solution is valid for the perceived or actual initial state). It delegates puzzle verification to game-specific factories via get_game_factory from the puzzles submodule, supporting configurable grid sizes through the GRID_SIZE global variable.

Usage

Called internally by the VGRPBench dataset class during puzzle evaluation.

Code Reference

Source: vlmeval/dataset/utils/vgrpbench/score.py, Lines: L1-438
Import: from vlmeval.dataset.utils.vgrpbench.score import extract_perception_and_answer

Key Functions:

GRID_SIZE = None  # Global variable for puzzle grid size

def extract_perception_and_answer(model_output): ...
def evaluate_perception(perceived_state, actual_state): ...
def verify_solution(solution, initial_state, puzzle_type): ...

I/O Contract

Direction	Description
Inputs	Raw model output string containing "Initial State" and "Answer" sections; actual puzzle state; puzzle type identifier
Outputs	Tuple of (initial_state, solution) as 2D arrays; perception accuracy score; boolean solution correctness

Usage Examples

# Internal usage example
from vlmeval.dataset.utils.vgrpbench.score import extract_perception_and_answer
initial_state, solution = extract_perception_and_answer(model_output)

Related Pages

Principle:Open_compass_VLMEvalKit_Benchmark_Dataset_Construction

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment