Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Sail sg LongSpec MathScale Answer Utils

From Leeroopedia
Knowledge Sources
Domains NLP, Evaluation, Mathematics
Last Updated 2026-02-14 05:00 GMT

Overview

Concrete tool for extracting and comparing mathematical answers using MathScale-style heuristics including boxed extraction, multi-strategy fallback, and LaTeX-normalized equivalence checking.

Description

The mathscale/util.py module provides answer extraction and equivalence checking utilities originally from Microsoft's MathScale benchmark. It includes strip_string for LaTeX normalization, unbox_and_extract for \\boxed{} content extraction, mathscale_is_equiv for answer equivalence comparison, is_correct for full answer checking with multiple extraction strategies, and several versioned answer extraction functions (v2, v3, v4) with increasing sophistication.

Usage

Import these functions when evaluating model outputs on MathScale or general math benchmarks where flexible answer extraction with fallback strategies is needed. Used by MathScaleCallBack in the evaluation callback pipeline.

Code Reference

Source Location

Signature

def strip_string(string: str) -> str:
    """Normalize LaTeX string: fix fracs, sqrt, remove units, convert fraction notation."""

def unbox_and_extract(text: str) -> Tuple[str, list]:
    """Extract all \\boxed{} contents and return (unboxed_text, extracted_contents)."""

def mathscale_is_equiv(prediction_ans: str, reference_ans: str, verbose: bool = False) -> Tuple[bool, str, str]:
    """Compare prediction and reference using number, inline math, and substring heuristics."""

def is_correct(completion: str, answer: str, verbose: bool = False) -> Tuple[bool, str, str]:
    """Full answer extraction + equivalence check with multi-strategy fallback."""

def mathscale_extract_answer_v2(completion: str) -> str:
    """Extract answer using boxed, numbers, 'answer is', 'is' patterns with strip_string."""

def mathscale_extract_answer_fn_v3(completion_field: str = "response") -> Callable:
    """Return list-processing function using v2 extraction."""

def extract_pure_prompt_aligner() -> Callable:
    """Extract question text from '### Instruction:' / '### Response:' format."""

Import

from data.mathscale.util import (
    mathscale_is_equiv, mathscale_is_equiv_proxy, is_correct,
    mathscale_extract_answer_v2, strip_string
)

I/O Contract

Inputs

Name Type Required Description
completion str Yes Model completion text to extract answer from
answer str For is_correct Ground truth answer for comparison
prediction_ans str For is_equiv Predicted answer string
reference_ans str For is_equiv Reference answer string

Outputs

Name Type Description
judge bool Whether prediction matches reference
clean_prediction str Normalized prediction string
clean_reference str Normalized reference string

Usage Examples

from data.mathscale.util import is_correct, mathscale_is_equiv

# Full extraction + comparison
judge, pred, ref = is_correct("The answer is \\boxed{42}", "42")
# judge = True

# Direct equivalence check
judge, pred, ref = mathscale_is_equiv("\\frac{1}{2}", "0.5")
# judge = True (both are numbers)

# Extraction from model output
from data.mathscale.util import mathscale_extract_answer_v2
answer = mathscale_extract_answer_v2("Therefore, the answer is 42.")
# answer = "42"

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment