Implementation:Open compass VLMEvalKit VGRPBench Score
| Field | Value |
|---|---|
| source | VLMEvalKit |
| domain | Vision, Evaluation, Grid Reasoning, Puzzle Solving |
Overview
Implements the scoring module for VGRPBench, evaluating visual grid reasoning puzzle solutions including perception accuracy and solution verification.
Description
This module provides extract_perception_and_answer for parsing model outputs into perceived initial states and solutions, handling both "Initial State/Answer" and "Perception/Solution" section header formats. It uses json_repair for robust JSON parsing of model outputs. The scoring system evaluates two dimensions: perception accuracy (how well the model reads the puzzle grid) and solution correctness (whether the solution is valid for the perceived or actual initial state). It delegates puzzle verification to game-specific factories via get_game_factory from the puzzles submodule, supporting configurable grid sizes through the GRID_SIZE global variable.
Usage
Called internally by the VGRPBench dataset class during puzzle evaluation.
Code Reference
- Source:
vlmeval/dataset/utils/vgrpbench/score.py, Lines: L1-438 - Import:
from vlmeval.dataset.utils.vgrpbench.score import extract_perception_and_answer
Key Functions:
GRID_SIZE = None # Global variable for puzzle grid size
def extract_perception_and_answer(model_output): ...
def evaluate_perception(perceived_state, actual_state): ...
def verify_solution(solution, initial_state, puzzle_type): ...
I/O Contract
| Direction | Description |
|---|---|
| Inputs | Raw model output string containing "Initial State" and "Answer" sections; actual puzzle state; puzzle type identifier |
| Outputs | Tuple of (initial_state, solution) as 2D arrays; perception accuracy score; boolean solution correctness |
Usage Examples
# Internal usage example
from vlmeval.dataset.utils.vgrpbench.score import extract_perception_and_answer
initial_state, solution = extract_perception_and_answer(model_output)