Implementation:EvolvingLMMs Lab Lmms eval SciBench Utils

Source File: `lmms_eval/tasks/scibench/utils.py`

Principle: [[../principles/EvolvingLMMs_Lab_Lmms_eval_Task_Utility_Functions|Task_Utility_Functions]]

Overview

The SciBench Utils module provides utility functions for processing and evaluating scientific benchmark questions that involve mathematical answers with units and scientific notation. It handles parsing LaTeX-formatted answers, normalizing scientific notation, and comparing numerical results with tolerance.

Key Functions

Document Processing

scibench_doc_to_text(doc: Dict, lmms_eval_specific_kwargs: Dict) -> str

Formats a SciBench document into a question prompt

Extracts problem text and unit information
Handles unit normalization by removing scientific notation prefixes
Constructs prompt with pre/post prompts from kwargs
Appends unit information to question if available

Answer Parsing

parse_math_answer(raw_string)

Main entry point for extracting answers from model output

Chains together last_boxed_only_string and remove_boxed
Returns parsed mathematical answer

last_boxed_only_string(string)

Finds the last LaTeX boxed answer in a string

Searches for "oxed" or "\fbox" markers (note: uses "oxed" instead of "boxed")
Handles nested braces to extract complete boxed content
Returns the boxed substring or None if not found

remove_boxed(s)

Extracts content from LaTeX boxed notation

Validates "oxed{...}" format
Handles equations by taking value after "=" sign
Returns the answer value or None on failure

extract_boxed_answers(text)

Alternative extraction using regex

Finds all boxed{...} patterns
Filters for numeric values (integers or decimals with optional sign)
Returns first numeric match or None

Scientific Notation Handling

parse_not(inputs)

Parses scientific notation into coefficient and exponent

Handles multiple formats: "\times", "\times", "*"
Splits input into coefficient and exponent parts
Returns tuple (coefficient, exponent) or empty strings on error

cal_not(inputs)

Calculates actual value from scientific notation

Expects tuple of (coefficient, exponent_string)
Extracts exponent from "10^{...}" format
Computes coefficient × 10^exponent
Returns string representation of computed value

remove_not(x)

Removes scientific notation suffix from a string

Uses regex to find "10^{...}" patterns with optional "$" delimiters
Returns text after the scientific notation marker
Returns None if no notation found

Answer Comparison

clean_number_string(s)

Normalizes number strings for comparison

Handles None input by returning empty string
Removes commas (thousands separators)
Converts minus sign variants (−) to standard hyphen (-)

equiv_with_unit(model_output, answer, unit)

Compares model output with ground truth answer

Uses clean_number_string for normalization
Performs two comparison strategies:
- Compares full model output with tolerance (5% relative)
- Compares first token of model output with tolerance
Uses isclose with rel_tol=0.05
Returns True if either strategy matches

Results Processing

scibench_process_results(doc: Dict, result: List[str]) -> Dict[str, float]

Processes model results and computes accuracy

Extracts prediction from result list
Parses prediction using parse_math_answer
Retrieves ground truth answer and unit from document
Handles unit conversion if scientific notation differs between prediction and ground truth
- Uses remove_not to detect notation differences
- Applies cal_not to normalize both values
Compares using equiv_with_unit
Returns dictionary with "accuracy" key (1 or 0)

Design Characteristics

Robust Parsing: Multiple fallback strategies for extracting answers from various response formats
Flexible Notation: Handles multiple scientific notation formats and LaTeX conventions
Tolerant Comparison: Uses 5% relative tolerance for numerical comparisons
Error Handling: Graceful degradation with try-except blocks and None checks
Unit Awareness: Automatically normalizes scientific notation in units for fair comparison

Dependencies

re - Regular expression operations for pattern matching
math.isclose - Numerical comparison with tolerance
typing.Dict, List - Type annotations

Usage Context

This utility module is referenced in SciBench task YAML configurations to process mathematical and scientific questions that require numerical answers with units. It ensures fair comparison even when models express answers in different scientific notation formats.

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment