Implementation:EvolvingLMMs Lab Lmms eval MathVision Eval Utils

Knowledge Sources	EvolvingLMMs_Lab_Lmms_eval
Domains	Mathematical_Reasoning, Model_Evaluation, LaTeX_Processing, Answer_Extraction
Last Updated	2026-02-14 00:00 GMT

Overview

Evaluation utilities for mathematical reasoning tasks with LaTeX answer extraction and equivalence checking.

Description

This module provides utilities for evaluating mathematical reasoning in the MathVision benchmark. It includes functions for extracting answers from LaTeX-formatted mathematical expressions, normalizing mathematical notation, checking numerical equivalence using symbolic computation (latex2sympy), handling various mathematical formats (tuples, lists, fractions, equations), and filtering results by mathematical subdomain. The implementation handles complex LaTeX formatting, removes units and unnecessary notation, and performs robust mathematical equivalence checking.

Usage

Use this module when evaluating models on mathematical reasoning tasks that produce LaTeX-formatted answers. The utilities handle answer extraction from boxed LaTeX expressions, normalize different mathematical notations to canonical forms, and check equivalence considering numerical precision and symbolic equality.

Code Reference

Source Location

Repository: EvolvingLMMs_Lab_Lmms_eval
File: lmms_eval/tasks/mathvision/eval_utils.py

Signature

def is_equal(asw: str, gt_asw: str) -> bool:
    """Judge if asw is equivalent to gt_asw."""
    ...

def find_math_answer(s: str) -> str:
    """Extract and normalize answer from LaTeX string."""
    ...

def eval_tuple(s: str) -> str:
    """Evaluate mathematical expressions within tuples or lists."""
    ...

def in_area(id: str, area: str) -> bool:
    """Determine if a given ID falls within a specified area."""
    ...

def extract_nums(s: str) -> list:
    """Extract all numeric values from string."""
    ...

def is_number(s: str) -> bool:
    """Check if string represents a number."""
    ...

# File I/O
def save_jsonl(path: str, data: list, t_stamp: bool = True) -> None:
    """Save data to JSONL file with optional timestamp."""
    ...

def load_jsonl(path: str) -> list:
    """Load data from JSONL file."""
    ...

Import

from lmms_eval.tasks.mathvision.eval_utils import (
    is_equal,
    find_math_answer,
    eval_tuple,
    in_area,
    extract_nums,
    is_number,
    save_jsonl,
    load_jsonl,
)

I/O Contract

Inputs

Name	Type	Required	Description
asw	str	Yes	Answer string to check (for is_equal)
gt_asw	str	Yes	Ground truth answer string (for is_equal)
s	str	Yes	LaTeX string containing answer (for find_math_answer)
id	str	Yes	Problem identifier (for in_area)
area	str	Yes	Mathematical area/domain (for in_area)

Outputs

Name	Type	Description
equal	bool	True if answers are mathematically equivalent
answer	str	Extracted and normalized answer string
in_domain	bool	True if ID belongs to specified area
numbers	list	List of extracted numeric values

Core Functions

Answer Comparison

is_equal(asw, gt_asw)

Compares two mathematical answer strings for equivalence
Handles multiple comparison strategies:

 1. Exact string match after lowercasing
 2. Tuple/list evaluation and comparison
 3. LaTeX symbolic evaluation using latex2sympy
 4. Numerical comparison with 2 decimal precision

Returns True if answers are equivalent, False otherwise
Robust to formatting differences and mathematical notation variations

Answer Extraction

find_math_answer(s)

Extracts answer from LaTeX boxed expressions
Searches for pattern: \boxed{...}
Falls back to entire string if pattern not found
Handles nested braces and complex expressions
Processes equals signs and approximations
Removes LaTeX formatting and units
Returns cleaned answer string

LaTeX Processing

_strip_string(string)

Comprehensive LaTeX cleanup function
Removes:

 * Line breaks and inverse spaces
 * LaTeX commands (\left, \right, \text, etc.)
 * Units and percentage signs
 * Degree notation (^\circ)
 * Dollar signs

Handles:

 * Fractions (converts to \frac format)
 * Square roots (ensures proper bracing)
 * Decimal numbers starting with period
 * Equality and approximation signs

Returns normalized string

_fix_fracs(string)

Fixes improper fraction notation
Ensures \frac commands have proper braces
Handles cases like \frac ab → \frac{a}{b}
Returns corrected string

_fix_sqrt(string)

Ensures square root arguments are properly braced
Converts \sqrt a → \sqrt{a}
Returns corrected string

_fix_a_slash_b(string)

Converts simple fractions to LaTeX format
Transforms a/b → \frac{a}{b}
Only processes simple integer fractions
Returns LaTeX fraction string

Tuple/List Evaluation

eval_tuple(s)

Evaluates mathematical expressions within tuples/lists
Handles formats: (a,b,c) or [a,b,c]
Uses latex2sympy for expression evaluation
Rounds results to 2 decimal places
Skips special values: infty, a, -a
Returns evaluated tuple/list string

Number Extraction

extract_nums(s)

Extracts all numeric values from string
Handles scientific notation: 1.5e10
Handles decimals: 3.14
Handles signed numbers: -42
Removes commas from numbers
Returns list of numeric values

is_number(s)

Checks if string represents a valid number
Handles commas in numbers
Returns boolean

Domain Filtering

in_area(id, area)

Checks if problem ID belongs to mathematical area
Handles two ID formats:

 * Path format: test/precalculus/244.json
 * CSV format: abstract_algebra_test.csv_1

Special case: area="all" always returns True
Returns boolean

File I/O

save_jsonl(path, data, t_stamp=True)

Saves list of dictionaries to JSONL file
Optional timestamp appending
UTF-8 encoding
Progress bar with tqdm
Format: one JSON object per line

load_jsonl(path)

Loads JSONL file into list of dictionaries
UTF-8 encoding
Skips empty lines
Returns list of dicts

timestamp()

Generates timestamp string
Format: -YYYYMMDD-HHMM
Example: -20260214-1430
Returns timestamp string

Utility Functions

delete_extra_zero(n)

Removes unnecessary trailing zeros from numbers
Converts to int if result is whole number
Handles both int and float inputs
Returns string representation

_remove_right_units(string)

Removes unit text from LaTeX strings
Splits on "\text{ " delimiter
Returns string before last unit occurrence

Usage Examples

# Example 1: Check answer equivalence
is_equal("3.14159", "3.14")  # True (rounds to 2 decimals)
is_equal("\\frac{1}{2}", "0.5")  # True (symbolic equivalence)
is_equal("(1,2,3)", "(1, 2, 3)")  # True (tuple equivalence)

# Example 2: Extract answer from LaTeX
response = "The solution is \\boxed{42} as shown above."
answer = find_math_answer(response)
# Returns: "42"

response = "We get \\boxed{\\frac{3}{4}} from the calculation."
answer = find_math_answer(response)
# Returns: "\\frac{3}{4}"

# Example 3: Evaluate tuple expressions
eval_tuple("(2*3, 5+2)")  # Returns: "(6,7)"
eval_tuple("[10/2, 3^2]")  # Returns: "[5.0,9.0]"
eval_tuple("(infty, -a)")  # Returns: "(infty,-a)" (preserves special values)

# Example 4: Check if ID in area
in_area("test/precalculus/244.json", "precalculus")  # True
in_area("abstract_algebra_test.csv_1", "algebra")  # True
in_area("geometry_test.csv_5", "precalculus")  # False
in_area("anything", "all")  # True

# Example 5: Extract numbers
extract_nums("The answer is 3.14 or maybe -2.5e10")
# Returns: [3.14, -2.5e10]

extract_nums("Values: 1,234.56 and 789")
# Returns: [1234.56, 789]

# Example 6: Save and load results
results = [
    {"problem_id": "1", "answer": "42", "correct": True},
    {"problem_id": "2", "answer": "3.14", "correct": False}
]
save_jsonl("results.jsonl", results, t_stamp=True)
# Saves to: results-20260214-1430.jsonl

loaded_results = load_jsonl("results-20260214-1430.jsonl")
# Returns: list of dicts

# Example 7: Complex equivalence checking
# Handles different fraction formats
is_equal("1/2", "\\frac{1}{2}")  # True

# Handles different tuple formats
is_equal("(1, 2, 3)", "(1,2,3)")  # True

# Handles mathematical expressions
is_equal("2+3", "5")  # True (evaluates expressions)

# Example 8: LaTeX cleanup
answer = find_math_answer("\\boxed{3^{\\circ}\\text{ units}}")
# Returns: "3" (removes degree symbol and units)

answer = find_math_answer("\\boxed{\\frac{1}{2}\\%}")
# Returns: "\\frac{1}{2}" (removes percentage)

LaTeX Handling Details

Supported LaTeX Commands

Fractions: \frac{a}{b}, \tfrac, \dfrac
Roots: \sqrt{x}
Brackets: \left, \right
Text: \text{}, \mbox{}
Angles: ^\circ
Infinity: \infty
Matrices: bmatrix, pmatrix

Removed Elements

Units: m^3, km, units
Percentages: \%, %
Degree symbols: ^\circ, ^{\circ}
Dollar signs: $, \$
Spacing: \!, \\
Line breaks: \n

Normalization Rules

0.5 → \frac{1}{2}
a/b → \frac{a}{b} (for integers)
.5 → 0.5
Numbers rounded to 2 decimals for comparison

Mathematical Equivalence

The is_equal function checks equivalence through multiple strategies:

1. String equality - Direct comparison after lowercasing 2. Tuple/list equality - Element-wise comparison after evaluation 3. Symbolic equality - Using latex2sympy conversion 4. Numerical equality - Comparing evaluated results (2 decimal precision)

This multi-strategy approach handles various mathematical notation styles and ensures robust equivalence checking.

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment