Implementation:EvolvingLMMs Lab Lmms eval MathVision Utils

Location: /tmp/kapso_repo_sslb_59s/lmms_eval/tasks/mathvision/utils.py

Purpose

Task-specific utilities for MathVision benchmark with both standard evaluation and LLM-as-judge scoring for mathematical visual question answering.

def mathvision_doc_to_visual(doc)

Extracts and converts decoded image to RGB format.

def mathvision_doc_to_text(doc, lmms_eval_specific_kwargs=None)

Formats question with options:

Constructs multiple-choice options (A, B, C, ...)
Adds optional mc_prompt from kwargs
Base prompt: 'Please solve the problem step by step and put your answer in one "\\boxed{}".'
Appends choices if available
Returns formatted query prompt

def mathvision_gpt_eval_process_results(doc, results)

LLM-as-judge evaluation:

def mathvision_process_results(doc, results)

Standard evaluation with extensive answer normalization:

Answer Extraction:

Answer Normalization:

Returns:

Dict with "mathvision_standard_eval" containing:
- response: list of predictions
- scores: list of correctness bools

def mathvision_aggregate_results_eval(results)

Aggregates standard evaluation:

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment