Implementation:Open compass VLMEvalKit MathCanvas Utils

Field	Value
source	VLMEvalKit
domain	Vision, Evaluation, Mathematics, Multi-step Reasoning

Overview

Provides evaluation utilities for the MathCanvas benchmark, supporting multi-step mathematical reasoning with interleaved text and image content.

Description

This module implements `extract_reasoning_steps` for parsing model responses that contain interleaved text and base64-encoded images into structured reasoning steps. It uses a weighted sub-question scoring system (`SUB_QUESTION_WEIGHTS`) where later steps receive higher weights (reflecting increasing difficulty). The evaluation uses an OpenAI-based prompt template loaded from an external file for GPT-based answer assessment. Pydantic models are used for structured validation.

Usage

Called internally by the corresponding dataset class during evaluation.

Code Reference

Source: vlmeval/dataset/utils/mathcanvas.py, Lines: L1-277
Import: from vlmeval.dataset.utils.mathcanvas import extract_reasoning_steps

Key Functions:

def extract_reasoning_steps(input_text: str): ...
SUB_QUESTION_WEIGHTS = {2: [...], 3: [...], 4: [...]}

I/O Contract

Direction	Description
Inputs	Model response text with interleaved markdown images (base64) and text reasoning steps
Outputs	List of structured step dicts with 'type' (text/image/error), 'content', and 'iteration' fields

Usage Examples

from vlmeval.dataset.utils.mathcanvas import extract_reasoning_steps

steps = extract_reasoning_steps(model_response_text)

Related Pages

Principle:Open_compass_VLMEvalKit_Benchmark_Dataset_Construction

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment