Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:EvolvingLMMs Lab Lmms eval EmbSpatial Utils

From Leeroopedia

Location: /tmp/kapso_repo_sslb_59s/lmms_eval/tasks/embspatial/utils.py

Principle: Task_Utility_Functions

Purpose

Task-specific utilities for the EmbSpatial benchmark, which evaluates spatial reasoning in multimodal models using multiple-choice questions.

Key Functions

_extract_answer_letter

def _extract_answer_letter(text: str) -> str

Extracts answer choice letter (A-Z) from model response using regex patterns. Handles formats like "A)", "(B)", "C.", etc. Returns empty string if no letter found.

embspatial_doc_to_text

def embspatial_doc_to_text(doc: dict[str, Any], lmms_eval_specific_kwargs: Optional[dict[str, Any]] = None) -> str

Formats question with answer options (A, B, C, D format). Applies optional pre_prompt from kwargs.

embspatial_doc_to_visual

def embspatial_doc_to_visual(doc: dict) -> list

Extracts and converts document image to RGB format for model input.

embspatial_process_results

def embspatial_process_results(doc, results)

Processes model output by extracting predicted letter and comparing to ground truth. Returns structured submission dict with question ID, GT, prediction, sub-task, and correctness flag.

embspatial_aggregate_results

def embspatial_aggregate_results(results: List[Dict])

Aggregates results by computing overall accuracy and per-sub-task accuracy. Logs detailed breakdown by relation type (sub-task). Returns total accuracy score.

Implementation Details

  • Uses YAML config from _default_template_yaml
  • Sub-tasks tracked via "relation" field in documents
  • Multiple-choice format with 4 options (A-D)
  • Handles spatial relation categories separately for detailed metrics

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment