Implementation:Open compass VLMEvalKit MMHelix Sudoku Eval

Field	Value
source	VLMEvalKit
domain	Vision, Evaluation, Puzzle Solving, Sudoku

Overview

Evaluates classic 9x9 Sudoku puzzle solutions in the MMHelix benchmark using rule-based validation of rows, columns, and 3x3 blocks.

Description

The `SudokuEvaluator` class extends `BaseEvaluator` to validate 9x9 Sudoku solutions entirely through rules, ignoring ground truth. The `_parse_grid_like` helper function supports multiple input formats: Python list literals, whitespace-delimited grids with '.' as blanks, and nested lists. Validation checks that all given numbers from the initial state are preserved, and that each row, column, and 3x3 block contains digits 1-9 exactly once.

Usage

Called internally by the corresponding dataset class during evaluation.

Code Reference

Source: vlmeval/dataset/utils/mmhelix/evaluators/sudoku_evaluator.py, Lines: L1-113
Import: from vlmeval.dataset.utils.mmhelix.evaluators.sudoku_evaluator import SudokuEvaluator

Key Functions:

class SudokuEvaluator(BaseEvaluator):
    def evaluate(self, predicted_answer, ground_truth, initial_state) -> bool: ...

def _parse_grid_like(obj) -> Optional[List[List[int]]]: ...

I/O Contract

Direction	Description
Inputs	Model output with a 9x9 grid (various formats); initial state grid with given numbers
Outputs	Boolean indicating whether the grid satisfies all Sudoku rules

Usage Examples

from vlmeval.dataset.utils.mmhelix.evaluators.sudoku_evaluator import SudokuEvaluator

evaluator = SudokuEvaluator()
is_correct = evaluator.evaluate(predicted_grid, None, initial_state)

Related Pages

Principle:Open_compass_VLMEvalKit_Benchmark_Dataset_Construction

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment