Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:EvolvingLMMs Lab Lmms eval SciBench Utils

From Leeroopedia

Source File: `lmms_eval/tasks/scibench/utils.py`

Principle: [[../principles/EvolvingLMMs_Lab_Lmms_eval_Task_Utility_Functions|Task_Utility_Functions]]

Overview

The SciBench Utils module provides utility functions for processing and evaluating scientific benchmark questions that involve mathematical answers with units and scientific notation. It handles parsing LaTeX-formatted answers, normalizing scientific notation, and comparing numerical results with tolerance.

Key Functions

Document Processing

scibench_doc_to_text(doc: Dict, lmms_eval_specific_kwargs: Dict) -> str
Formats a SciBench document into a question prompt
  • Extracts problem text and unit information
  • Handles unit normalization by removing scientific notation prefixes
  • Constructs prompt with pre/post prompts from kwargs
  • Appends unit information to question if available

Answer Parsing

parse_math_answer(raw_string)
Main entry point for extracting answers from model output
  • Chains together last_boxed_only_string and remove_boxed
  • Returns parsed mathematical answer
last_boxed_only_string(string)
Finds the last LaTeX boxed answer in a string
  • Searches for "oxed" or "\fbox" markers (note: uses "oxed" instead of "boxed")
  • Handles nested braces to extract complete boxed content
  • Returns the boxed substring or None if not found
remove_boxed(s)
Extracts content from LaTeX boxed notation
  • Validates "oxed{...}" format
  • Handles equations by taking value after "=" sign
  • Returns the answer value or None on failure
extract_boxed_answers(text)
Alternative extraction using regex
  • Finds all boxed{...} patterns
  • Filters for numeric values (integers or decimals with optional sign)
  • Returns first numeric match or None

Scientific Notation Handling

parse_not(inputs)
Parses scientific notation into coefficient and exponent
  • Handles multiple formats: "\times", "\times", "*"
  • Splits input into coefficient and exponent parts
  • Returns tuple (coefficient, exponent) or empty strings on error
cal_not(inputs)
Calculates actual value from scientific notation
  • Expects tuple of (coefficient, exponent_string)
  • Extracts exponent from "10^{...}" format
  • Computes coefficient × 10^exponent
  • Returns string representation of computed value
remove_not(x)
Removes scientific notation suffix from a string
  • Uses regex to find "10^{...}" patterns with optional "$" delimiters
  • Returns text after the scientific notation marker
  • Returns None if no notation found

Answer Comparison

clean_number_string(s)
Normalizes number strings for comparison
  • Handles None input by returning empty string
  • Removes commas (thousands separators)
  • Converts minus sign variants (−) to standard hyphen (-)
equiv_with_unit(model_output, answer, unit)
Compares model output with ground truth answer
  • Uses clean_number_string for normalization
  • Performs two comparison strategies:
    • Compares full model output with tolerance (5% relative)
    • Compares first token of model output with tolerance
  • Uses isclose with rel_tol=0.05
  • Returns True if either strategy matches

Results Processing

scibench_process_results(doc: Dict, result: List[str]) -> Dict[str, float]
Processes model results and computes accuracy
  • Extracts prediction from result list
  • Parses prediction using parse_math_answer
  • Retrieves ground truth answer and unit from document
  • Handles unit conversion if scientific notation differs between prediction and ground truth
    • Uses remove_not to detect notation differences
    • Applies cal_not to normalize both values
  • Compares using equiv_with_unit
  • Returns dictionary with "accuracy" key (1 or 0)

Design Characteristics

  • Robust Parsing: Multiple fallback strategies for extracting answers from various response formats
  • Flexible Notation: Handles multiple scientific notation formats and LaTeX conventions
  • Tolerant Comparison: Uses 5% relative tolerance for numerical comparisons
  • Error Handling: Graceful degradation with try-except blocks and None checks
  • Unit Awareness: Automatically normalizes scientific notation in units for fair comparison

Dependencies

  • re - Regular expression operations for pattern matching
  • math.isclose - Numerical comparison with tolerance
  • typing.Dict, List - Type annotations

Usage Context

This utility module is referenced in SciBench task YAML configurations to process mathematical and scientific questions that require numerical answers with units. It ensures fair comparison even when models express answers in different scientific notation formats.

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment