Implementation:EvolvingLMMs Lab Lmms eval SciBench Utils
Source File: `lmms_eval/tasks/scibench/utils.py`
Principle: [[../principles/EvolvingLMMs_Lab_Lmms_eval_Task_Utility_Functions|Task_Utility_Functions]]
Overview
The SciBench Utils module provides utility functions for processing and evaluating scientific benchmark questions that involve mathematical answers with units and scientific notation. It handles parsing LaTeX-formatted answers, normalizing scientific notation, and comparing numerical results with tolerance.
Key Functions
Document Processing
scibench_doc_to_text(doc: Dict, lmms_eval_specific_kwargs: Dict) -> str- Formats a SciBench document into a question prompt
- Extracts problem text and unit information
- Handles unit normalization by removing scientific notation prefixes
- Constructs prompt with pre/post prompts from kwargs
- Appends unit information to question if available
Answer Parsing
parse_math_answer(raw_string)- Main entry point for extracting answers from model output
- Chains together
last_boxed_only_stringandremove_boxed - Returns parsed mathematical answer
- Chains together
last_boxed_only_string(string)- Finds the last LaTeX boxed answer in a string
- Searches for "oxed" or "\fbox" markers (note: uses "oxed" instead of "boxed")
- Handles nested braces to extract complete boxed content
- Returns the boxed substring or None if not found
remove_boxed(s)- Extracts content from LaTeX boxed notation
- Validates "oxed{...}" format
- Handles equations by taking value after "=" sign
- Returns the answer value or None on failure
extract_boxed_answers(text)- Alternative extraction using regex
- Finds all
boxed{...}patterns - Filters for numeric values (integers or decimals with optional sign)
- Returns first numeric match or None
- Finds all
Scientific Notation Handling
parse_not(inputs)- Parses scientific notation into coefficient and exponent
- Handles multiple formats: "\times", "\times", "*"
- Splits input into coefficient and exponent parts
- Returns tuple (coefficient, exponent) or empty strings on error
cal_not(inputs)- Calculates actual value from scientific notation
- Expects tuple of (coefficient, exponent_string)
- Extracts exponent from "10^{...}" format
- Computes coefficient × 10^exponent
- Returns string representation of computed value
remove_not(x)- Removes scientific notation suffix from a string
- Uses regex to find "10^{...}" patterns with optional "$" delimiters
- Returns text after the scientific notation marker
- Returns None if no notation found
Answer Comparison
clean_number_string(s)- Normalizes number strings for comparison
- Handles None input by returning empty string
- Removes commas (thousands separators)
- Converts minus sign variants (−) to standard hyphen (-)
equiv_with_unit(model_output, answer, unit)- Compares model output with ground truth answer
- Uses
clean_number_stringfor normalization - Performs two comparison strategies:
- Compares full model output with tolerance (5% relative)
- Compares first token of model output with tolerance
- Uses
isclosewithrel_tol=0.05 - Returns True if either strategy matches
- Uses
Results Processing
scibench_process_results(doc: Dict, result: List[str]) -> Dict[str, float]- Processes model results and computes accuracy
- Extracts prediction from result list
- Parses prediction using
parse_math_answer - Retrieves ground truth answer and unit from document
- Handles unit conversion if scientific notation differs between prediction and ground truth
- Uses
remove_notto detect notation differences - Applies
cal_notto normalize both values
- Uses
- Compares using
equiv_with_unit - Returns dictionary with "accuracy" key (1 or 0)
Design Characteristics
- Robust Parsing: Multiple fallback strategies for extracting answers from various response formats
- Flexible Notation: Handles multiple scientific notation formats and LaTeX conventions
- Tolerant Comparison: Uses 5% relative tolerance for numerical comparisons
- Error Handling: Graceful degradation with try-except blocks and None checks
- Unit Awareness: Automatically normalizes scientific notation in units for fair comparison
Dependencies
re- Regular expression operations for pattern matchingmath.isclose- Numerical comparison with tolerancetyping.Dict, List- Type annotations
Usage Context
This utility module is referenced in SciBench task YAML configurations to process mathematical and scientific questions that require numerical answers with units. It ensures fair comparison even when models express answers in different scientific notation formats.