Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Sail sg LongSpec OCW Courses Eval

From Leeroopedia
Knowledge Sources
Domains NLP, Evaluation, Mathematics, Symbolic_Computation
Last Updated 2026-02-14 05:00 GMT

Overview

Concrete tool for evaluating mathematical answers on OCW Courses benchmarks using numeric normalization, symbolic equation parsing, and TeX expression equivalence via SymPy.

Description

The ocwcourses_eval_utils.py module provides specialized evaluation utilities for the MIT OpenCourseWare math benchmark. It includes normalize_numeric for unit-stripping and numeric conversion (with LaTeX parsing fallback), numeric_equality for threshold-based floating point comparison, normalize_symbolic_equation for parsing LaTeX equations into SymPy objects, and the SymbolicMathMixin class providing normalize_tex (following Lewkowycz et al. 2022 methodology), parse_tex, is_exp_equiv, and is_tex_equiv for full symbolic math equivalence checking with timeout protection.

Usage

Import these utilities when evaluating model outputs on OCW Courses or similar benchmarks requiring symbolic math equivalence. Used by eval_ocwcourses in the eval_script module.

Code Reference

Source Location

Signature

def normalize_numeric(s: str) -> Union[float, str]:
    """Strip units and convert string to float, with LaTeX parse fallback."""

def numeric_equality(n1: float, n2: float, threshold: float = 0.01) -> bool:
    """Threshold-based numeric comparison using np.isclose."""

def normalize_symbolic_equation(s: str) -> Union[sympy.Equality, str]:
    """Parse LaTeX equation string into SymPy Equality object."""

class SymbolicMathMixin:
    """Methods for parsing math expressions and determining equivalence."""

    def normalize_tex(self, final_answer: str) -> str:
        """Normalize TeX expression following Lewkowycz et al. (2022) methodology."""

    def parse_tex(self, text: str, time_limit: int = 5) -> Optional[sympy.Basic]:
        """Parse normalized TeX string into SymPy expression with timeout."""

    def is_exp_equiv(self, x1: sympy.Basic, x2: sympy.Basic, time_limit: int = 5) -> bool:
        """Determine if two SymPy expressions are equal via simplify(x1-x2)==0."""

    def is_tex_equiv(self, x1: str, x2: str, time_limit: int = 5) -> bool:
        """Check TeX equivalence: string match first, then symbolic comparison."""

Import

from data.deepseek_math_utils.ocwcourses_eval_utils import (
    normalize_numeric, numeric_equality, SymbolicMathMixin,
    normalize_symbolic_equation
)

I/O Contract

Inputs

Name Type Required Description
s / final_answer str Yes Math expression string (LaTeX or numeric)
time_limit int No Timeout in seconds for SymPy operations (default 5)
threshold float No Numeric equality threshold (default 0.01)

Outputs

Name Type Description
normalized float or sympy.Basic or str Normalized expression (INVALID_ANSWER on failure)
is_equal bool Whether two expressions are equivalent

Usage Examples

from data.deepseek_math_utils.ocwcourses_eval_utils import (
    normalize_numeric, numeric_equality, SymbolicMathMixin
)

# Numeric normalization (strips units)
val = normalize_numeric("3.14 eV")
# val = 3.14

# Numeric equality
assert numeric_equality(3.14, 3.1401) == True

# Symbolic TeX equivalence
mixin = SymbolicMathMixin()
tex1 = mixin.normalize_tex("\\frac{1}{2}")
tex2 = mixin.normalize_tex("0.5")
# String match or symbolic comparison

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment