Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Open compass VLMEvalKit MathCanvas Utils

From Leeroopedia
Field Value
source VLMEvalKit
domain Vision, Evaluation, Mathematics, Multi-step Reasoning

Overview

Provides evaluation utilities for the MathCanvas benchmark, supporting multi-step mathematical reasoning with interleaved text and image content.

Description

This module implements `extract_reasoning_steps` for parsing model responses that contain interleaved text and base64-encoded images into structured reasoning steps. It uses a weighted sub-question scoring system (`SUB_QUESTION_WEIGHTS`) where later steps receive higher weights (reflecting increasing difficulty). The evaluation uses an OpenAI-based prompt template loaded from an external file for GPT-based answer assessment. Pydantic models are used for structured validation.

Usage

Called internally by the corresponding dataset class during evaluation.

Code Reference

  • Source: vlmeval/dataset/utils/mathcanvas.py, Lines: L1-277
  • Import: from vlmeval.dataset.utils.mathcanvas import extract_reasoning_steps

Key Functions:

def extract_reasoning_steps(input_text: str): ...
SUB_QUESTION_WEIGHTS = {2: [...], 3: [...], 4: [...]}

I/O Contract

Direction Description
Inputs Model response text with interleaved markdown images (base64) and text reasoning steps
Outputs List of structured step dicts with 'type' (text/image/error), 'content', and 'iteration' fields

Usage Examples

from vlmeval.dataset.utils.mathcanvas import extract_reasoning_steps

steps = extract_reasoning_steps(model_response_text)

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment