Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Open compass VLMEvalKit MMHelix Calcudoku Eval

From Leeroopedia
Field Value
source VLMEvalKit
domain Vision, Evaluation, Puzzle Solving, Calcudoku

Overview

Evaluates Calcudoku (calculation Sudoku) puzzle solutions in the MMHelix benchmark by verifying row/column uniqueness and region arithmetic constraints.

Description

The `CalcudokuEvaluator` class extends `BaseEvaluator` to validate Calcudoku solutions. It verifies that each row and column contains numbers 1 to n exactly once, and that numbers within each region combine using the specified operator (+, -, *, /) to achieve the target value. The `extract_answer` method parses 2D array solutions from model output, and `prepare_prompt` constructs problem descriptions including region definitions with cells, operations, and targets.

Usage

Called internally by the corresponding dataset class during evaluation.

Code Reference

  • Source: vlmeval/dataset/utils/mmhelix/evaluators/calcudoku_eval.py, Lines: L1-218
  • Import: from vlmeval.dataset.utils.mmhelix.evaluators.calcudoku_eval import CalcudokuEvaluator

Key Functions:

class CalcudokuEvaluator(BaseEvaluator):
    def prepare_prompt(self, question, params): ...
    def extract_answer(self, model_output) -> List[List[int]]: ...
    def evaluate(self, predicted_answer, ground_truth, params) -> bool: ...

I/O Contract

Direction Description
Inputs Model output string containing a 2D array solution; puzzle params with size and region definitions
Outputs Boolean indicating whether the solution satisfies all Calcudoku constraints

Usage Examples

from vlmeval.dataset.utils.mmhelix.evaluators.calcudoku_eval import CalcudokuEvaluator

evaluator = CalcudokuEvaluator()
answer = evaluator.extract_answer(model_output)
is_correct = evaluator.evaluate(answer, ground_truth, params)

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment