Implementation:EvolvingLMMs Lab Lmms eval Uni MMMU Visual Puzzle Utils

Knowledge Sources	EvolvingLMMs_Lab_Lmms_eval
Domains	Vision, Evaluation, Puzzles, Geometry
Last Updated	2026-02-14 00:00 GMT

Overview

Utility functions for evaluating vision-language models on Uni-MMMU benchmark, which tests visual puzzle solving across jigsaw completion, maze navigation, sliding puzzles, and geometry problems.

Description

This module provides specialized evaluation functions for four distinct visual puzzle types in Uni-MMMU: (1) Jigsaw puzzles requiring patch selection based on seam continuity and semantics, (2) Maze solving with path finding from start to goal, (3) Sliding puzzles requiring tile movement sequences, and (4) Geometry problems with auxiliary line construction and step-by-step solutions. Each puzzle type has custom prompt templates, answer extraction logic, and evaluation metrics (exact match for jigsaw, frame accuracy for sequential puzzles).

Usage

Use this when evaluating multimodal models on visual puzzle and spatial reasoning tasks. Each puzzle type has dedicated doc_to_text, doc_to_visual, and process_results functions. Jigsaw puzzles expect JSON output with choice (0 or 1), while maze and sliding puzzles expect move sequences in JSON arrays. Geometry problems use natural language extraction with normalization.

Code Reference

Source Location

Repository: EvolvingLMMs_Lab_Lmms_eval
File: lmms_eval/tasks/uni_mmmu/utils.py

Signature

# Jigsaw puzzle functions
def jigsaw_doc_to_visual(doc: Dict) -> List[Image.Image]
def jigsaw_doc_to_text(doc: Dict, lmms_eval_specific_kwargs: Optional[Dict] = None) -> str
def jigsaw_process_results(doc: Dict, results: List[str]) -> Dict[str, float]

# Maze solving functions
def maze_doc_to_visual(doc: Dict) -> List[Image.Image]
def maze_doc_to_text(doc: Dict, lmms_eval_specific_kwargs: Optional[Dict] = None) -> str
def maze_process_results(doc: Dict, results: List[str]) -> Dict[str, float]

# Sliding puzzle functions
def sliding_doc_to_visual(doc: Dict) -> List[Image.Image]
def sliding_doc_to_text(doc: Dict, lmms_eval_specific_kwargs: Optional[Dict] = None) -> str
def sliding_process_results(doc: Dict, results: List[str]) -> Dict[str, float]

# Geometry problem functions
def geometry_doc_to_visual(doc: Dict) -> List[Image.Image]
def geometry_doc_to_text(doc: Dict, lmms_eval_specific_kwargs: Optional[Dict] = None) -> str
def geometry_process_results(doc: Dict, results: List[str]) -> Dict[str, float]

# Helper functions
def _find_json_object(text: str) -> Optional[str]
def _parse_json_list(raw: str) -> List[Any]
def _normalize_geometry_answer(text: str) -> str
def _extract_final_answer(text: str) -> str

Import

from lmms_eval.tasks.uni_mmmu.utils import (
    jigsaw_doc_to_text,
    jigsaw_process_results,
    maze_doc_to_text,
    maze_process_results,
    sliding_doc_to_text,
    sliding_process_results,
    geometry_doc_to_text,
    geometry_process_results
)

I/O Contract

Jigsaw Puzzle I/O

Input	Type	Description
doc["ref_image"]	Image	2x2 reference with bottom-right hidden
doc["cand0_image"]	Image	Candidate patch 0
doc["cand1_image"]	Image	Candidate patch 1
doc["label"]	int	Ground truth choice (0 or 1)

Output	Type	Description
exact_match	float	1.0 if predicted choice matches label, else 0.0

Maze/Sliding Puzzle I/O

Input	Type	Description
doc["initial_image"]	Image	Puzzle start state visualization
doc["steps"] (maze)	str/List	Ground truth move sequence (JSON array)
doc["steps_words"] (sliding)	str/List	Ground truth move words (JSON array)

Output	Type	Description
exact_match	float	1.0 if full sequence matches, else 0.0
frame_accuracy	float	Proportion of moves correct (0.0 to 1.0)

Geometry Problem I/O

Input	Type	Description
doc["image"]	Image	Geometry diagram
doc["question"] / doc["problem"]	str	Problem statement
doc["answer"] / doc["solution_en"]	str	Ground truth answer

Output	Type	Description
exact_match	float	1.0 if normalized answers match, else 0.0

Usage Examples

# Jigsaw puzzle evaluation
jigsaw_doc = {
    "ref_image": ref_img,
    "cand0_image": cand0_img,
    "cand1_image": cand1_img,
    "label": 1
}
prompt = jigsaw_doc_to_text(jigsaw_doc)
# Model responds: "<FINAL_ANSWER_JSON>\n{\"choice\": 1, \"rationale\": \"...\"}\n</FINAL_ANSWER_JSON>"
result = jigsaw_process_results(jigsaw_doc, [model_response])
print(result["exact_match"])  # 1.0 (correct)

# Maze solving evaluation
maze_doc = {
    "initial_image": maze_img,
    "steps": "[\"right\", \"down\", \"right\", \"up\"]"
}
prompt = maze_doc_to_text(maze_doc)
# Model responds: "Let me solve... <ANSWER_JSON>[\"right\", \"down\", \"right\", \"up\"]</ANSWER_JSON>"
result = maze_process_results(maze_doc, [model_response])
print(result["exact_match"])  # 1.0
print(result["frame_accuracy"])  # 1.0

# Sliding puzzle with partial correctness
sliding_doc = {
    "initial_image": sliding_img,
    "steps_words": "[\"down\", \"right\", \"up\", \"left\"]"
}
model_output = "<ANSWER_JSON>[\"down\", \"right\", \"down\", \"left\"]</ANSWER_JSON>"
result = sliding_process_results(sliding_doc, [model_output])
print(result["exact_match"])  # 0.0 (not fully correct)
print(result["frame_accuracy"])  # 0.75 (3 out of 4 moves correct)

# Geometry problem evaluation
geom_doc = {
    "image": geom_diagram,
    "question": "Find the angle ABC if angle BAC is 30 degrees",
    "answer": "60 degrees"
}
prompt = geometry_doc_to_text(geom_doc)
# Model responds: "Using auxiliary lines... The answer is 60°"
result = geometry_process_results(geom_doc, [model_response])
print(result["exact_match"])  # 1.0 (normalized: "60" == "60")

# Helper: Extract JSON from complex response
response = "Let me think... The answer is <ANSWER_JSON>[\"up\", \"down\"]</ANSWER_JSON> because..."
moves = _parse_json_list(response.split("<ANSWER_JSON>")[1].split("</ANSWER_JSON>")[0])
print(moves)  # ["up", "down"]

# Helper: Normalize geometry answers
normalized = _normalize_geometry_answer("The answer is 45 degrees")
print(normalized)  # "45"

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment