Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Open compass VLMEvalKit MMHelix Numbrix Eval

From Leeroopedia
Field Value
source VLMEvalKit
domain Vision, Evaluation, Puzzle Solving, Numbrix

Overview

Evaluates Numbrix puzzle solutions in the MMHelix benchmark by verifying number uniqueness, initial state preservation, and consecutive number adjacency.

Description

The `NumbrixEvaluator` class extends `BaseEvaluator` to validate Numbrix puzzle solutions where consecutive numbers must be horizontally or vertically adjacent. It checks three conditions: number uniqueness across the grid, preservation of initial given numbers, and adjacency of consecutive integers. The `_normalize_grid` and `_parse_grid` methods handle various grid input formats, and the evaluator supports an optional verbose mode for debugging.

Usage

Called internally by the corresponding dataset class during evaluation.

Code Reference

  • Source: vlmeval/dataset/utils/mmhelix/evaluators/numbrix_eval.py, Lines: L1-180
  • Import: from vlmeval.dataset.utils.mmhelix.evaluators.numbrix_eval import NumbrixEvaluator

Key Functions:

class NumbrixEvaluator(BaseEvaluator):
    def evaluate(self, predicted_answer, ground_truth, initial_state) -> bool: ...
    def _check_number_uniqueness(self, grid): ...
    def _parse_grid(self, grid_str): ...

I/O Contract

Direction Description
Inputs Predicted grid string, optional ground truth, and initial state grid with given numbers
Outputs Boolean indicating whether the solution satisfies all Numbrix constraints

Usage Examples

from vlmeval.dataset.utils.mmhelix.evaluators.numbrix_eval import NumbrixEvaluator

evaluator = NumbrixEvaluator()
is_correct = evaluator.evaluate(predicted, ground_truth, initial_state)

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment