Implementation:Sail sg LongSpec MathScale Answer Utils
| Knowledge Sources | |
|---|---|
| Domains | NLP, Evaluation, Mathematics |
| Last Updated | 2026-02-14 05:00 GMT |
Overview
Concrete tool for extracting and comparing mathematical answers using MathScale-style heuristics including boxed extraction, multi-strategy fallback, and LaTeX-normalized equivalence checking.
Description
The mathscale/util.py module provides answer extraction and equivalence checking utilities originally from Microsoft's MathScale benchmark. It includes strip_string for LaTeX normalization, unbox_and_extract for \\boxed{} content extraction, mathscale_is_equiv for answer equivalence comparison, is_correct for full answer checking with multiple extraction strategies, and several versioned answer extraction functions (v2, v3, v4) with increasing sophistication.
Usage
Import these functions when evaluating model outputs on MathScale or general math benchmarks where flexible answer extraction with fallback strategies is needed. Used by MathScaleCallBack in the evaluation callback pipeline.
Code Reference
Source Location
- Repository: Sail_sg_LongSpec
- File: longspec/train/data/mathscale/util.py
- Lines: 1-547
Signature
def strip_string(string: str) -> str:
"""Normalize LaTeX string: fix fracs, sqrt, remove units, convert fraction notation."""
def unbox_and_extract(text: str) -> Tuple[str, list]:
"""Extract all \\boxed{} contents and return (unboxed_text, extracted_contents)."""
def mathscale_is_equiv(prediction_ans: str, reference_ans: str, verbose: bool = False) -> Tuple[bool, str, str]:
"""Compare prediction and reference using number, inline math, and substring heuristics."""
def is_correct(completion: str, answer: str, verbose: bool = False) -> Tuple[bool, str, str]:
"""Full answer extraction + equivalence check with multi-strategy fallback."""
def mathscale_extract_answer_v2(completion: str) -> str:
"""Extract answer using boxed, numbers, 'answer is', 'is' patterns with strip_string."""
def mathscale_extract_answer_fn_v3(completion_field: str = "response") -> Callable:
"""Return list-processing function using v2 extraction."""
def extract_pure_prompt_aligner() -> Callable:
"""Extract question text from '### Instruction:' / '### Response:' format."""
Import
from data.mathscale.util import (
mathscale_is_equiv, mathscale_is_equiv_proxy, is_correct,
mathscale_extract_answer_v2, strip_string
)
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| completion | str | Yes | Model completion text to extract answer from |
| answer | str | For is_correct | Ground truth answer for comparison |
| prediction_ans | str | For is_equiv | Predicted answer string |
| reference_ans | str | For is_equiv | Reference answer string |
Outputs
| Name | Type | Description |
|---|---|---|
| judge | bool | Whether prediction matches reference |
| clean_prediction | str | Normalized prediction string |
| clean_reference | str | Normalized reference string |
Usage Examples
from data.mathscale.util import is_correct, mathscale_is_equiv
# Full extraction + comparison
judge, pred, ref = is_correct("The answer is \\boxed{42}", "42")
# judge = True
# Direct equivalence check
judge, pred, ref = mathscale_is_equiv("\\frac{1}{2}", "0.5")
# judge = True (both are numbers)
# Extraction from model output
from data.mathscale.util import mathscale_extract_answer_v2
answer = mathscale_extract_answer_v2("Therefore, the answer is 42.")
# answer = "42"