Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Open compass VLMEvalKit MMHelix Sokoban Eval

From Leeroopedia
Field Value
source VLMEvalKit
domain Vision, Evaluation, Puzzle Solving, Sokoban

Overview

Evaluates Sokoban puzzle solutions in the MMHelix benchmark by simulating box-pushing movements and verifying all boxes reach goal positions.

Description

The `SokobanEvaluator` class extends `BaseEvaluator` to simulate and validate Sokoban puzzle solutions. It defines game elements (WALL, PLAYER, BOX, GOAL, BOX_ON_GOAL, PLAYER_ON_GOAL, FLOOR) and a DIRECTIONS mapping. The `extract_answer` method parses movement directions from model output, supporting `<answer>` tags. The evaluator simulates the full game sequence, checking wall collisions, box-into-wall pushes, and whether all boxes end up on goal positions.

Usage

Called internally by the corresponding dataset class during evaluation.

Code Reference

  • Source: vlmeval/dataset/utils/mmhelix/evaluators/sokoban_eval.py, Lines: L1-237
  • Import: from vlmeval.dataset.utils.mmhelix.evaluators.sokoban_eval import SokobanEvaluator

Key Functions:

class SokobanEvaluator(BaseEvaluator):
    def extract_answer(self, model_output) -> List[str]: ...
    def evaluate(self, predicted_answer, ground_truth, initial_state) -> bool: ...

I/O Contract

Direction Description
Inputs Model output with movement directions; initial Sokoban grid state with player, boxes, and goals
Outputs Boolean indicating whether the movement sequence solves the Sokoban puzzle

Usage Examples

from vlmeval.dataset.utils.mmhelix.evaluators.sokoban_eval import SokobanEvaluator

evaluator = SokobanEvaluator()
moves = evaluator.extract_answer(model_output)
is_correct = evaluator.evaluate(moves, ground_truth, initial_state)

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment