Implementation:Open compass VLMEvalKit MMHelix Sokoban Eval
| Field | Value |
|---|---|
| source | VLMEvalKit |
| domain | Vision, Evaluation, Puzzle Solving, Sokoban |
Overview
Evaluates Sokoban puzzle solutions in the MMHelix benchmark by simulating box-pushing movements and verifying all boxes reach goal positions.
Description
The `SokobanEvaluator` class extends `BaseEvaluator` to simulate and validate Sokoban puzzle solutions. It defines game elements (WALL, PLAYER, BOX, GOAL, BOX_ON_GOAL, PLAYER_ON_GOAL, FLOOR) and a DIRECTIONS mapping. The `extract_answer` method parses movement directions from model output, supporting `<answer>` tags. The evaluator simulates the full game sequence, checking wall collisions, box-into-wall pushes, and whether all boxes end up on goal positions.
Usage
Called internally by the corresponding dataset class during evaluation.
Code Reference
- Source:
vlmeval/dataset/utils/mmhelix/evaluators/sokoban_eval.py, Lines: L1-237 - Import:
from vlmeval.dataset.utils.mmhelix.evaluators.sokoban_eval import SokobanEvaluator
Key Functions:
class SokobanEvaluator(BaseEvaluator):
def extract_answer(self, model_output) -> List[str]: ...
def evaluate(self, predicted_answer, ground_truth, initial_state) -> bool: ...
I/O Contract
| Direction | Description |
|---|---|
| Inputs | Model output with movement directions; initial Sokoban grid state with player, boxes, and goals |
| Outputs | Boolean indicating whether the movement sequence solves the Sokoban puzzle |
Usage Examples
from vlmeval.dataset.utils.mmhelix.evaluators.sokoban_eval import SokobanEvaluator
evaluator = SokobanEvaluator()
moves = evaluator.extract_answer(model_output)
is_correct = evaluator.evaluate(moves, ground_truth, initial_state)