Implementation:Open compass VLMEvalKit MMHelix Calcudoku Eval
| Field | Value |
|---|---|
| source | VLMEvalKit |
| domain | Vision, Evaluation, Puzzle Solving, Calcudoku |
Overview
Evaluates Calcudoku (calculation Sudoku) puzzle solutions in the MMHelix benchmark by verifying row/column uniqueness and region arithmetic constraints.
Description
The `CalcudokuEvaluator` class extends `BaseEvaluator` to validate Calcudoku solutions. It verifies that each row and column contains numbers 1 to n exactly once, and that numbers within each region combine using the specified operator (+, -, *, /) to achieve the target value. The `extract_answer` method parses 2D array solutions from model output, and `prepare_prompt` constructs problem descriptions including region definitions with cells, operations, and targets.
Usage
Called internally by the corresponding dataset class during evaluation.
Code Reference
- Source:
vlmeval/dataset/utils/mmhelix/evaluators/calcudoku_eval.py, Lines: L1-218 - Import:
from vlmeval.dataset.utils.mmhelix.evaluators.calcudoku_eval import CalcudokuEvaluator
Key Functions:
class CalcudokuEvaluator(BaseEvaluator):
def prepare_prompt(self, question, params): ...
def extract_answer(self, model_output) -> List[List[int]]: ...
def evaluate(self, predicted_answer, ground_truth, params) -> bool: ...
I/O Contract
| Direction | Description |
|---|---|
| Inputs | Model output string containing a 2D array solution; puzzle params with size and region definitions |
| Outputs | Boolean indicating whether the solution satisfies all Calcudoku constraints |
Usage Examples
from vlmeval.dataset.utils.mmhelix.evaluators.calcudoku_eval import CalcudokuEvaluator
evaluator = CalcudokuEvaluator()
answer = evaluator.extract_answer(model_output)
is_correct = evaluator.evaluate(answer, ground_truth, params)