Implementation:Open compass VLMEvalKit MMHelix Calcudoku Eval

Field	Value
source	VLMEvalKit
domain	Vision, Evaluation, Puzzle Solving, Calcudoku

Overview

Evaluates Calcudoku (calculation Sudoku) puzzle solutions in the MMHelix benchmark by verifying row/column uniqueness and region arithmetic constraints.

Description

The `CalcudokuEvaluator` class extends `BaseEvaluator` to validate Calcudoku solutions. It verifies that each row and column contains numbers 1 to n exactly once, and that numbers within each region combine using the specified operator (+, -, *, /) to achieve the target value. The `extract_answer` method parses 2D array solutions from model output, and `prepare_prompt` constructs problem descriptions including region definitions with cells, operations, and targets.

Usage

Called internally by the corresponding dataset class during evaluation.

Code Reference

Source: vlmeval/dataset/utils/mmhelix/evaluators/calcudoku_eval.py, Lines: L1-218
Import: from vlmeval.dataset.utils.mmhelix.evaluators.calcudoku_eval import CalcudokuEvaluator

Key Functions:

class CalcudokuEvaluator(BaseEvaluator):
    def prepare_prompt(self, question, params): ...
    def extract_answer(self, model_output) -> List[List[int]]: ...
    def evaluate(self, predicted_answer, ground_truth, params) -> bool: ...

I/O Contract

Direction	Description
Inputs	Model output string containing a 2D array solution; puzzle params with size and region definitions
Outputs	Boolean indicating whether the solution satisfies all Calcudoku constraints

Usage Examples

from vlmeval.dataset.utils.mmhelix.evaluators.calcudoku_eval import CalcudokuEvaluator

evaluator = CalcudokuEvaluator()
answer = evaluator.extract_answer(model_output)
is_correct = evaluator.evaluate(answer, ground_truth, params)

Related Pages

Principle:Open_compass_VLMEvalKit_Benchmark_Dataset_Construction

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment