Principle:SqueezeAILab ETS Answer Extraction

Knowledge Sources	ETS
Domains	Evaluation, Text_Processing, Mathematical_Reasoning
Last Updated	2026-02-14 02:00 GMT

Overview

A multi-strategy text parsing system that extracts mathematical answers from free-form model-generated reasoning trajectories.

Description

Language models produce reasoning trajectories as free-form text. To evaluate correctness, the final answer must be extracted from this text. Different model families use different answer formatting conventions, requiring model-specific extraction strategies:

Shepherd/Llemma format: Answers delimited by "The answer is:" ... "ки" (Cyrillic step token)
Boxed format: LaTeX \boxed{answer} notation (common in MATH benchmark)
Natural language: "The answer is" followed by the answer text
Program output: Code output blocks delimited by triple backticks
Numeric fallback: Last number in the text

The extraction pipeline tries strategies in priority order, falling back to less specific patterns if earlier ones fail. The extracted answer is then normalized (stripping whitespace, commas, fixing LaTeX fractions/sqrt) before being passed to the grading function.

Usage

Apply answer extraction after tree search completes, before grading. The extraction function is selected based on the model type: extract_shepherd_answer for llemma/mistral models, extract_answer for general-purpose extraction.

Theoretical Basis

Answer extraction from free-form text is a pattern matching problem with a cascading priority strategy:

# Abstract extraction strategy
def extract(text):
    # Priority 1: Structured format ("final answer is $...$. I hope")
    if structured_match := match_structured(text):
        return structured_match
    # Priority 2: LaTeX boxed format
    if boxed := extract_boxed(text):
        return boxed
    # Priority 3: Natural language ("the answer is")
    if nl_match := match_natural_language(text):
        return nl_match
    # Priority 4: Program output
    if program_output := extract_code_output(text):
        return program_output
    # Priority 5: Last number fallback
    return extract_last_number(text)

The cascading design ensures robustness across different model output formats while preferring more specific patterns that are less likely to produce false positives.

Related Pages

Implemented By

Implementation:SqueezeAILab_ETS_Extract_Answer

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment