Implementation:Open compass VLMEvalKit MathCanvas Utils
| Field | Value |
|---|---|
| source | VLMEvalKit |
| domain | Vision, Evaluation, Mathematics, Multi-step Reasoning |
Overview
Provides evaluation utilities for the MathCanvas benchmark, supporting multi-step mathematical reasoning with interleaved text and image content.
Description
This module implements `extract_reasoning_steps` for parsing model responses that contain interleaved text and base64-encoded images into structured reasoning steps. It uses a weighted sub-question scoring system (`SUB_QUESTION_WEIGHTS`) where later steps receive higher weights (reflecting increasing difficulty). The evaluation uses an OpenAI-based prompt template loaded from an external file for GPT-based answer assessment. Pydantic models are used for structured validation.
Usage
Called internally by the corresponding dataset class during evaluation.
Code Reference
- Source:
vlmeval/dataset/utils/mathcanvas.py, Lines: L1-277 - Import:
from vlmeval.dataset.utils.mathcanvas import extract_reasoning_steps
Key Functions:
def extract_reasoning_steps(input_text: str): ...
SUB_QUESTION_WEIGHTS = {2: [...], 3: [...], 4: [...]}
I/O Contract
| Direction | Description |
|---|---|
| Inputs | Model response text with interleaved markdown images (base64) and text reasoning steps |
| Outputs | List of structured step dicts with 'type' (text/image/error), 'content', and 'iteration' fields |
Usage Examples
from vlmeval.dataset.utils.mathcanvas import extract_reasoning_steps
steps = extract_reasoning_steps(model_response_text)