Implementation:Sail sg LongSpec LaTeX Normalization Utils
| Knowledge Sources | |
|---|---|
| Domains | NLP, Evaluation, Mathematics |
| Last Updated | 2026-02-14 05:00 GMT |
Overview
Concrete tool for normalizing LaTeX math strings and checking equivalence, providing boxed answer extraction and string-based math comparison from MetaMath.
Description
The math_util.py module provides LaTeX string normalization utilities originally from the MetaMath repository. Key functions include last_boxed_only_string for extracting the last \\boxed{} or \\fbox{} content from a solution string, strip_string for comprehensive LaTeX normalization (fraction fixing, sqrt normalization, unit removal, whitespace cleanup), is_equiv for string-based equivalence checking after normalization, and _clean_numbers for formatting large numbers with commas.
Usage
Import these utilities when you need to compare LaTeX math expressions by string normalization. Used as the equivalence engine for MetaMath-style evaluation in OpenAIMATHCallBack and by math_gold_answer_extractor for extracting gold answers.
Code Reference
Source Location
- Repository: Sail_sg_LongSpec
- File: longspec/train/data/math_util.py
- Lines: 1-259
Signature
def last_boxed_only_string(string: str) -> Optional[str]:
"""Find the last \\boxed{} or \\fbox{} in string and return its full content."""
def strip_string(string: str) -> str:
"""Normalize LaTeX string: remove units, fix fracs/sqrt, standardize formatting."""
def is_equiv(str1: str, str2: str, verbose: bool = False) -> bool:
"""Check if two math strings are equivalent after normalization."""
def fix_fracs(string: str) -> str:
"""Convert \\frac1b -> \\frac{1}{b} shorthand notation."""
def fix_a_slash_b(string: str) -> str:
"""Convert a/b -> \\frac{a}{b} for simple integer fractions."""
def fix_sqrt(string: str) -> str:
"""Convert \\sqrt3 -> \\sqrt{3} shorthand notation."""
class NotEqual:
"""Sentinel object that is never equal to anything."""
Import
from data.math_util import is_equiv, last_boxed_only_string, strip_string
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| string / str1 / str2 | str | Yes | LaTeX math expression string |
| verbose | bool | No | Whether to print normalized strings for debugging |
Outputs
| Name | Type | Description |
|---|---|---|
| result | bool | Whether two strings are equivalent after normalization |
| boxed_content | str or None | Extracted \\boxed{} content (None if not found) |
| normalized | str | Normalized LaTeX string |
Usage Examples
from data.math_util import is_equiv, last_boxed_only_string, strip_string
# Extract boxed answer
boxed = last_boxed_only_string("The answer is \\boxed{\\frac{1}{2}}")
# boxed = "\\boxed{\\frac{1}{2}}"
# Normalize LaTeX
normalized = strip_string("\\dfrac12")
# normalized = "\\frac{1}{2}"
# Equivalence check
assert is_equiv("\\frac{1}{2}", "\\frac12") == True
assert is_equiv("0.5", "\\frac{1}{2}") == False # String-based only