Implementation:Sail sg LongSpec MathScale Answer Utils

Knowledge Sources	Sail_sg_LongSpec MathScale
Domains	NLP, Evaluation, Mathematics
Last Updated	2026-02-14 05:00 GMT

Overview

Concrete tool for extracting and comparing mathematical answers using MathScale-style heuristics including boxed extraction, multi-strategy fallback, and LaTeX-normalized equivalence checking.

Description

The mathscale/util.py module provides answer extraction and equivalence checking utilities originally from Microsoft's MathScale benchmark. It includes strip_string for LaTeX normalization, unbox_and_extract for \\boxed{} content extraction, mathscale_is_equiv for answer equivalence comparison, is_correct for full answer checking with multiple extraction strategies, and several versioned answer extraction functions (v2, v3, v4) with increasing sophistication.

Usage

Import these functions when evaluating model outputs on MathScale or general math benchmarks where flexible answer extraction with fallback strategies is needed. Used by MathScaleCallBack in the evaluation callback pipeline.

Code Reference

Source Location

Repository: Sail_sg_LongSpec
File: longspec/train/data/mathscale/util.py
Lines: 1-547

Signature

def strip_string(string: str) -> str:
    """Normalize LaTeX string: fix fracs, sqrt, remove units, convert fraction notation."""

def unbox_and_extract(text: str) -> Tuple[str, list]:
    """Extract all \\boxed{} contents and return (unboxed_text, extracted_contents)."""

def mathscale_is_equiv(prediction_ans: str, reference_ans: str, verbose: bool = False) -> Tuple[bool, str, str]:
    """Compare prediction and reference using number, inline math, and substring heuristics."""

def is_correct(completion: str, answer: str, verbose: bool = False) -> Tuple[bool, str, str]:
    """Full answer extraction + equivalence check with multi-strategy fallback."""

def mathscale_extract_answer_v2(completion: str) -> str:
    """Extract answer using boxed, numbers, 'answer is', 'is' patterns with strip_string."""

def mathscale_extract_answer_fn_v3(completion_field: str = "response") -> Callable:
    """Return list-processing function using v2 extraction."""

def extract_pure_prompt_aligner() -> Callable:
    """Extract question text from '### Instruction:' / '### Response:' format."""

Import

from data.mathscale.util import (
    mathscale_is_equiv, mathscale_is_equiv_proxy, is_correct,
    mathscale_extract_answer_v2, strip_string
)

I/O Contract

Inputs

Name	Type	Required	Description
completion	str	Yes	Model completion text to extract answer from
answer	str	For is_correct	Ground truth answer for comparison
prediction_ans	str	For is_equiv	Predicted answer string
reference_ans	str	For is_equiv	Reference answer string

Outputs

Name	Type	Description
judge	bool	Whether prediction matches reference
clean_prediction	str	Normalized prediction string
clean_reference	str	Normalized reference string

Usage Examples

from data.mathscale.util import is_correct, mathscale_is_equiv

# Full extraction + comparison
judge, pred, ref = is_correct("The answer is \\boxed{42}", "42")
# judge = True

# Direct equivalence check
judge, pred, ref = mathscale_is_equiv("\\frac{1}{2}", "0.5")
# judge = True (both are numbers)

# Extraction from model output
from data.mathscale.util import mathscale_extract_answer_v2
answer = mathscale_extract_answer_v2("Therefore, the answer is 42.")
# answer = "42"

Related Pages

Environment:Sail_sg_LongSpec_Training_Environment

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment