Implementation:Open compass VLMEvalKit MMHelix Sokoban Eval

Field	Value
source	VLMEvalKit
domain	Vision, Evaluation, Puzzle Solving, Sokoban

Overview

Evaluates Sokoban puzzle solutions in the MMHelix benchmark by simulating box-pushing movements and verifying all boxes reach goal positions.

Description

The `SokobanEvaluator` class extends `BaseEvaluator` to simulate and validate Sokoban puzzle solutions. It defines game elements (WALL, PLAYER, BOX, GOAL, BOX_ON_GOAL, PLAYER_ON_GOAL, FLOOR) and a DIRECTIONS mapping. The `extract_answer` method parses movement directions from model output, supporting `<answer>` tags. The evaluator simulates the full game sequence, checking wall collisions, box-into-wall pushes, and whether all boxes end up on goal positions.

Usage

Called internally by the corresponding dataset class during evaluation.

Code Reference

Source: vlmeval/dataset/utils/mmhelix/evaluators/sokoban_eval.py, Lines: L1-237
Import: from vlmeval.dataset.utils.mmhelix.evaluators.sokoban_eval import SokobanEvaluator

Key Functions:

class SokobanEvaluator(BaseEvaluator):
    def extract_answer(self, model_output) -> List[str]: ...
    def evaluate(self, predicted_answer, ground_truth, initial_state) -> bool: ...

I/O Contract

Direction	Description
Inputs	Model output with movement directions; initial Sokoban grid state with player, boxes, and goals
Outputs	Boolean indicating whether the movement sequence solves the Sokoban puzzle

Usage Examples

from vlmeval.dataset.utils.mmhelix.evaluators.sokoban_eval import SokobanEvaluator

evaluator = SokobanEvaluator()
moves = evaluator.extract_answer(model_output)
is_correct = evaluator.evaluate(moves, ground_truth, initial_state)

Related Pages

Principle:Open_compass_VLMEvalKit_Benchmark_Dataset_Construction

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment