Implementation:EvolvingLMMs Lab Lmms eval EmbSpatial Utils
Location: /tmp/kapso_repo_sslb_59s/lmms_eval/tasks/embspatial/utils.py
Principle: Task_Utility_Functions
Purpose
Task-specific utilities for the EmbSpatial benchmark, which evaluates spatial reasoning in multimodal models using multiple-choice questions.
Key Functions
_extract_answer_letter
def _extract_answer_letter(text: str) -> str
Extracts answer choice letter (A-Z) from model response using regex patterns. Handles formats like "A)", "(B)", "C.", etc. Returns empty string if no letter found.
embspatial_doc_to_text
def embspatial_doc_to_text(doc: dict[str, Any], lmms_eval_specific_kwargs: Optional[dict[str, Any]] = None) -> str
Formats question with answer options (A, B, C, D format). Applies optional pre_prompt from kwargs.
embspatial_doc_to_visual
def embspatial_doc_to_visual(doc: dict) -> list
Extracts and converts document image to RGB format for model input.
embspatial_process_results
def embspatial_process_results(doc, results)
Processes model output by extracting predicted letter and comparing to ground truth. Returns structured submission dict with question ID, GT, prediction, sub-task, and correctness flag.
embspatial_aggregate_results
def embspatial_aggregate_results(results: List[Dict])
Aggregates results by computing overall accuracy and per-sub-task accuracy. Logs detailed breakdown by relation type (sub-task). Returns total accuracy score.
Implementation Details
- Uses YAML config from
_default_template_yaml - Sub-tasks tracked via "relation" field in documents
- Multiple-choice format with 4 options (A-D)
- Handles spatial relation categories separately for detailed metrics