Implementation:EvolvingLMMs Lab Lmms eval MathVision Eval Utils
| Knowledge Sources | |
|---|---|
| Domains | Mathematical_Reasoning, Model_Evaluation, LaTeX_Processing, Answer_Extraction |
| Last Updated | 2026-02-14 00:00 GMT |
Overview
Evaluation utilities for mathematical reasoning tasks with LaTeX answer extraction and equivalence checking.
Description
This module provides utilities for evaluating mathematical reasoning in the MathVision benchmark. It includes functions for extracting answers from LaTeX-formatted mathematical expressions, normalizing mathematical notation, checking numerical equivalence using symbolic computation (latex2sympy), handling various mathematical formats (tuples, lists, fractions, equations), and filtering results by mathematical subdomain. The implementation handles complex LaTeX formatting, removes units and unnecessary notation, and performs robust mathematical equivalence checking.
Usage
Use this module when evaluating models on mathematical reasoning tasks that produce LaTeX-formatted answers. The utilities handle answer extraction from boxed LaTeX expressions, normalize different mathematical notations to canonical forms, and check equivalence considering numerical precision and symbolic equality.
Code Reference
Source Location
- Repository: EvolvingLMMs_Lab_Lmms_eval
- File: lmms_eval/tasks/mathvision/eval_utils.py
Signature
def is_equal(asw: str, gt_asw: str) -> bool:
"""Judge if asw is equivalent to gt_asw."""
...
def find_math_answer(s: str) -> str:
"""Extract and normalize answer from LaTeX string."""
...
def eval_tuple(s: str) -> str:
"""Evaluate mathematical expressions within tuples or lists."""
...
def in_area(id: str, area: str) -> bool:
"""Determine if a given ID falls within a specified area."""
...
def extract_nums(s: str) -> list:
"""Extract all numeric values from string."""
...
def is_number(s: str) -> bool:
"""Check if string represents a number."""
...
# File I/O
def save_jsonl(path: str, data: list, t_stamp: bool = True) -> None:
"""Save data to JSONL file with optional timestamp."""
...
def load_jsonl(path: str) -> list:
"""Load data from JSONL file."""
...
Import
from lmms_eval.tasks.mathvision.eval_utils import (
is_equal,
find_math_answer,
eval_tuple,
in_area,
extract_nums,
is_number,
save_jsonl,
load_jsonl,
)
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| asw | str | Yes | Answer string to check (for is_equal) |
| gt_asw | str | Yes | Ground truth answer string (for is_equal) |
| s | str | Yes | LaTeX string containing answer (for find_math_answer) |
| id | str | Yes | Problem identifier (for in_area) |
| area | str | Yes | Mathematical area/domain (for in_area) |
Outputs
| Name | Type | Description |
|---|---|---|
| equal | bool | True if answers are mathematically equivalent |
| answer | str | Extracted and normalized answer string |
| in_domain | bool | True if ID belongs to specified area |
| numbers | list | List of extracted numeric values |
Core Functions
Answer Comparison
is_equal(asw, gt_asw)
- Compares two mathematical answer strings for equivalence
- Handles multiple comparison strategies:
1. Exact string match after lowercasing 2. Tuple/list evaluation and comparison 3. LaTeX symbolic evaluation using latex2sympy 4. Numerical comparison with 2 decimal precision
- Returns True if answers are equivalent, False otherwise
- Robust to formatting differences and mathematical notation variations
Answer Extraction
find_math_answer(s)
- Extracts answer from LaTeX boxed expressions
- Searches for pattern: \boxed{...}
- Falls back to entire string if pattern not found
- Handles nested braces and complex expressions
- Processes equals signs and approximations
- Removes LaTeX formatting and units
- Returns cleaned answer string
LaTeX Processing
_strip_string(string)
- Comprehensive LaTeX cleanup function
- Removes:
* Line breaks and inverse spaces * LaTeX commands (\left, \right, \text, etc.) * Units and percentage signs * Degree notation (^\circ) * Dollar signs
- Handles:
* Fractions (converts to \frac format) * Square roots (ensures proper bracing) * Decimal numbers starting with period * Equality and approximation signs
- Returns normalized string
_fix_fracs(string)
- Fixes improper fraction notation
- Ensures \frac commands have proper braces
- Handles cases like \frac ab → \frac{a}{b}
- Returns corrected string
_fix_sqrt(string)
- Ensures square root arguments are properly braced
- Converts \sqrt a → \sqrt{a}
- Returns corrected string
_fix_a_slash_b(string)
- Converts simple fractions to LaTeX format
- Transforms a/b → \frac{a}{b}
- Only processes simple integer fractions
- Returns LaTeX fraction string
Tuple/List Evaluation
eval_tuple(s)
- Evaluates mathematical expressions within tuples/lists
- Handles formats: (a,b,c) or [a,b,c]
- Uses latex2sympy for expression evaluation
- Rounds results to 2 decimal places
- Skips special values: infty, a, -a
- Returns evaluated tuple/list string
Number Extraction
extract_nums(s)
- Extracts all numeric values from string
- Handles scientific notation: 1.5e10
- Handles decimals: 3.14
- Handles signed numbers: -42
- Removes commas from numbers
- Returns list of numeric values
is_number(s)
- Checks if string represents a valid number
- Handles commas in numbers
- Returns boolean
Domain Filtering
in_area(id, area)
- Checks if problem ID belongs to mathematical area
- Handles two ID formats:
* Path format: test/precalculus/244.json * CSV format: abstract_algebra_test.csv_1
- Special case: area="all" always returns True
- Returns boolean
File I/O
save_jsonl(path, data, t_stamp=True)
- Saves list of dictionaries to JSONL file
- Optional timestamp appending
- UTF-8 encoding
- Progress bar with tqdm
- Format: one JSON object per line
load_jsonl(path)
- Loads JSONL file into list of dictionaries
- UTF-8 encoding
- Skips empty lines
- Returns list of dicts
timestamp()
- Generates timestamp string
- Format: -YYYYMMDD-HHMM
- Example: -20260214-1430
- Returns timestamp string
Utility Functions
delete_extra_zero(n)
- Removes unnecessary trailing zeros from numbers
- Converts to int if result is whole number
- Handles both int and float inputs
- Returns string representation
_remove_right_units(string)
- Removes unit text from LaTeX strings
- Splits on "\text{ " delimiter
- Returns string before last unit occurrence
Usage Examples
# Example 1: Check answer equivalence
is_equal("3.14159", "3.14") # True (rounds to 2 decimals)
is_equal("\\frac{1}{2}", "0.5") # True (symbolic equivalence)
is_equal("(1,2,3)", "(1, 2, 3)") # True (tuple equivalence)
# Example 2: Extract answer from LaTeX
response = "The solution is \\boxed{42} as shown above."
answer = find_math_answer(response)
# Returns: "42"
response = "We get \\boxed{\\frac{3}{4}} from the calculation."
answer = find_math_answer(response)
# Returns: "\\frac{3}{4}"
# Example 3: Evaluate tuple expressions
eval_tuple("(2*3, 5+2)") # Returns: "(6,7)"
eval_tuple("[10/2, 3^2]") # Returns: "[5.0,9.0]"
eval_tuple("(infty, -a)") # Returns: "(infty,-a)" (preserves special values)
# Example 4: Check if ID in area
in_area("test/precalculus/244.json", "precalculus") # True
in_area("abstract_algebra_test.csv_1", "algebra") # True
in_area("geometry_test.csv_5", "precalculus") # False
in_area("anything", "all") # True
# Example 5: Extract numbers
extract_nums("The answer is 3.14 or maybe -2.5e10")
# Returns: [3.14, -2.5e10]
extract_nums("Values: 1,234.56 and 789")
# Returns: [1234.56, 789]
# Example 6: Save and load results
results = [
{"problem_id": "1", "answer": "42", "correct": True},
{"problem_id": "2", "answer": "3.14", "correct": False}
]
save_jsonl("results.jsonl", results, t_stamp=True)
# Saves to: results-20260214-1430.jsonl
loaded_results = load_jsonl("results-20260214-1430.jsonl")
# Returns: list of dicts
# Example 7: Complex equivalence checking
# Handles different fraction formats
is_equal("1/2", "\\frac{1}{2}") # True
# Handles different tuple formats
is_equal("(1, 2, 3)", "(1,2,3)") # True
# Handles mathematical expressions
is_equal("2+3", "5") # True (evaluates expressions)
# Example 8: LaTeX cleanup
answer = find_math_answer("\\boxed{3^{\\circ}\\text{ units}}")
# Returns: "3" (removes degree symbol and units)
answer = find_math_answer("\\boxed{\\frac{1}{2}\\%}")
# Returns: "\\frac{1}{2}" (removes percentage)
LaTeX Handling Details
Supported LaTeX Commands
- Fractions: \frac{a}{b}, \tfrac, \dfrac
- Roots: \sqrt{x}
- Brackets: \left, \right
- Text: \text{}, \mbox{}
- Angles: ^\circ
- Infinity: \infty
- Matrices: bmatrix, pmatrix
Removed Elements
- Units: m^3, km, units
- Percentages: \%, %
- Degree symbols: ^\circ, ^{\circ}
- Dollar signs: $, \$
- Spacing: \!, \\
- Line breaks: \n
Normalization Rules
- 0.5 → \frac{1}{2}
- a/b → \frac{a}{b} (for integers)
- .5 → 0.5
- Numbers rounded to 2 decimals for comparison
Mathematical Equivalence
The is_equal function checks equivalence through multiple strategies:
1. String equality - Direct comparison after lowercasing 2. Tuple/list equality - Element-wise comparison after evaluation 3. Symbolic equality - Using latex2sympy conversion 4. Numerical equality - Comparing evaluated results (2 decimal precision)
This multi-strategy approach handles various mathematical notation styles and ensures robust equivalence checking.