Implementation:Sail sg LongSpec OCW Courses Eval
| Knowledge Sources | |
|---|---|
| Domains | NLP, Evaluation, Mathematics, Symbolic_Computation |
| Last Updated | 2026-02-14 05:00 GMT |
Overview
Concrete tool for evaluating mathematical answers on OCW Courses benchmarks using numeric normalization, symbolic equation parsing, and TeX expression equivalence via SymPy.
Description
The ocwcourses_eval_utils.py module provides specialized evaluation utilities for the MIT OpenCourseWare math benchmark. It includes normalize_numeric for unit-stripping and numeric conversion (with LaTeX parsing fallback), numeric_equality for threshold-based floating point comparison, normalize_symbolic_equation for parsing LaTeX equations into SymPy objects, and the SymbolicMathMixin class providing normalize_tex (following Lewkowycz et al. 2022 methodology), parse_tex, is_exp_equiv, and is_tex_equiv for full symbolic math equivalence checking with timeout protection.
Usage
Import these utilities when evaluating model outputs on OCW Courses or similar benchmarks requiring symbolic math equivalence. Used by eval_ocwcourses in the eval_script module.
Code Reference
Source Location
- Repository: Sail_sg_LongSpec
- File: longspec/train/data/deepseek_math_utils/ocwcourses_eval_utils.py
- Lines: 1-269
Signature
def normalize_numeric(s: str) -> Union[float, str]:
"""Strip units and convert string to float, with LaTeX parse fallback."""
def numeric_equality(n1: float, n2: float, threshold: float = 0.01) -> bool:
"""Threshold-based numeric comparison using np.isclose."""
def normalize_symbolic_equation(s: str) -> Union[sympy.Equality, str]:
"""Parse LaTeX equation string into SymPy Equality object."""
class SymbolicMathMixin:
"""Methods for parsing math expressions and determining equivalence."""
def normalize_tex(self, final_answer: str) -> str:
"""Normalize TeX expression following Lewkowycz et al. (2022) methodology."""
def parse_tex(self, text: str, time_limit: int = 5) -> Optional[sympy.Basic]:
"""Parse normalized TeX string into SymPy expression with timeout."""
def is_exp_equiv(self, x1: sympy.Basic, x2: sympy.Basic, time_limit: int = 5) -> bool:
"""Determine if two SymPy expressions are equal via simplify(x1-x2)==0."""
def is_tex_equiv(self, x1: str, x2: str, time_limit: int = 5) -> bool:
"""Check TeX equivalence: string match first, then symbolic comparison."""
Import
from data.deepseek_math_utils.ocwcourses_eval_utils import (
normalize_numeric, numeric_equality, SymbolicMathMixin,
normalize_symbolic_equation
)
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| s / final_answer | str | Yes | Math expression string (LaTeX or numeric) |
| time_limit | int | No | Timeout in seconds for SymPy operations (default 5) |
| threshold | float | No | Numeric equality threshold (default 0.01) |
Outputs
| Name | Type | Description |
|---|---|---|
| normalized | float or sympy.Basic or str | Normalized expression (INVALID_ANSWER on failure) |
| is_equal | bool | Whether two expressions are equivalent |
Usage Examples
from data.deepseek_math_utils.ocwcourses_eval_utils import (
normalize_numeric, numeric_equality, SymbolicMathMixin
)
# Numeric normalization (strips units)
val = normalize_numeric("3.14 eV")
# val = 3.14
# Numeric equality
assert numeric_equality(3.14, 3.1401) == True
# Symbolic TeX equivalence
mixin = SymbolicMathMixin()
tex1 = mixin.normalize_tex("\\frac{1}{2}")
tex2 = mixin.normalize_tex("0.5")
# String match or symbolic comparison