Implementation:Open compass VLMEvalKit MMHelix Sudoku Eval
| Field | Value |
|---|---|
| source | VLMEvalKit |
| domain | Vision, Evaluation, Puzzle Solving, Sudoku |
Overview
Evaluates classic 9x9 Sudoku puzzle solutions in the MMHelix benchmark using rule-based validation of rows, columns, and 3x3 blocks.
Description
The `SudokuEvaluator` class extends `BaseEvaluator` to validate 9x9 Sudoku solutions entirely through rules, ignoring ground truth. The `_parse_grid_like` helper function supports multiple input formats: Python list literals, whitespace-delimited grids with '.' as blanks, and nested lists. Validation checks that all given numbers from the initial state are preserved, and that each row, column, and 3x3 block contains digits 1-9 exactly once.
Usage
Called internally by the corresponding dataset class during evaluation.
Code Reference
- Source:
vlmeval/dataset/utils/mmhelix/evaluators/sudoku_evaluator.py, Lines: L1-113 - Import:
from vlmeval.dataset.utils.mmhelix.evaluators.sudoku_evaluator import SudokuEvaluator
Key Functions:
class SudokuEvaluator(BaseEvaluator):
def evaluate(self, predicted_answer, ground_truth, initial_state) -> bool: ...
def _parse_grid_like(obj) -> Optional[List[List[int]]]: ...
I/O Contract
| Direction | Description |
|---|---|
| Inputs | Model output with a 9x9 grid (various formats); initial state grid with given numbers |
| Outputs | Boolean indicating whether the grid satisfies all Sudoku rules |
Usage Examples
from vlmeval.dataset.utils.mmhelix.evaluators.sudoku_evaluator import SudokuEvaluator
evaluator = SudokuEvaluator()
is_correct = evaluator.evaluate(predicted_grid, None, initial_state)