Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Open compass VLMEvalKit WeMath Utils

From Leeroopedia
Field Value
source VLMEvalKit
domain Vision, Evaluation, Mathematics, Multi-step Reasoning

Overview

Provides evaluation utilities for the WeMath benchmark, implementing four-dimensional metrics for multi-step mathematical reasoning assessment.

Description

This module implements evaluate_evaluate_steps for evaluating individual knowledge concept steps and evaluate_process_steps_data for merging multi-step evaluation results. The load_and_process_data function loads prediction files, extracts answers from model responses by parsing the first letter after "Answer", and computes per-step correctness (joker) scores. The four-dimensional evaluation framework assesses knowledge concept mastery across multiple reasoning steps, merging step-wise results into a consolidated evaluation. It supports both pre-scored data (with 'hit' column) and raw predictions requiring answer extraction.

Usage

Called internally by the WeMath dataset class during multi-step mathematical reasoning evaluation.

Code Reference

  • Source: vlmeval/dataset/utils/wemath.py, Lines: L1-898
  • Import: from vlmeval.dataset.utils.wemath import load_and_process_data, evaluate_evaluate_steps

Key Functions:

def evaluate_evaluate_steps(json, steps): ...
def load_and_process_data(filepath): ...
def evaluate_process_steps_data(df, steps): ...

I/O Contract

Direction Description
Inputs Scored data file path or DataFrame with prediction/answer columns; number of reasoning steps to evaluate
Outputs DataFrame with per-step joker (correctness) scores; merged multi-step evaluation DataFrame with knowledge concept mappings

Usage Examples

# Internal usage example
from vlmeval.dataset.utils.wemath import load_and_process_data, evaluate_process_steps_data
df = load_and_process_data("predictions.xlsx")
merged = evaluate_process_steps_data(df, steps=3)

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment