Principle:SqueezeAILab ETS Answer Extraction
| Knowledge Sources | |
|---|---|
| Domains | Evaluation, Text_Processing, Mathematical_Reasoning |
| Last Updated | 2026-02-14 02:00 GMT |
Overview
A multi-strategy text parsing system that extracts mathematical answers from free-form model-generated reasoning trajectories.
Description
Language models produce reasoning trajectories as free-form text. To evaluate correctness, the final answer must be extracted from this text. Different model families use different answer formatting conventions, requiring model-specific extraction strategies:
- Shepherd/Llemma format: Answers delimited by "The answer is:" ... "ки" (Cyrillic step token)
- Boxed format: LaTeX \boxed{answer} notation (common in MATH benchmark)
- Natural language: "The answer is" followed by the answer text
- Program output: Code output blocks delimited by triple backticks
- Numeric fallback: Last number in the text
The extraction pipeline tries strategies in priority order, falling back to less specific patterns if earlier ones fail. The extracted answer is then normalized (stripping whitespace, commas, fixing LaTeX fractions/sqrt) before being passed to the grading function.
Usage
Apply answer extraction after tree search completes, before grading. The extraction function is selected based on the model type: extract_shepherd_answer for llemma/mistral models, extract_answer for general-purpose extraction.
Theoretical Basis
Answer extraction from free-form text is a pattern matching problem with a cascading priority strategy:
# Abstract extraction strategy
def extract(text):
# Priority 1: Structured format ("final answer is $...$. I hope")
if structured_match := match_structured(text):
return structured_match
# Priority 2: LaTeX boxed format
if boxed := extract_boxed(text):
return boxed
# Priority 3: Natural language ("the answer is")
if nl_match := match_natural_language(text):
return nl_match
# Priority 4: Program output
if program_output := extract_code_output(text):
return program_output
# Priority 5: Last number fallback
return extract_last_number(text)
The cascading design ensures robustness across different model output formats while preferring more specific patterns that are less likely to produce false positives.