Principle:Iamhankai Forest of Thought Answer Equivalence Checking
| Knowledge Sources | |
|---|---|
| Domains | Evaluation, Mathematics |
| Last Updated | 2026-02-14 03:00 GMT |
Overview
A multi-layer comparison strategy for determining whether a predicted mathematical answer is equivalent to the ground truth, handling diverse formats including numeric, symbolic, and LaTeX representations.
Description
Answer Equivalence Checking addresses the fundamental challenge of evaluating math reasoning: the same answer can be expressed in many different ways (e.g., "1/2", "0.5", "\\frac{1}{2}", "50%"). The pattern implements a cascade of increasingly sophisticated comparison methods:
- Direct string match: Simple string equality after normalization
- Numeric comparison: Float conversion with tolerance
- LaTeX parsing: Extract and compare boxed/formatted answers
- Symbolic equivalence: SymPy-based algebraic simplification and comparison
- Vector/set comparison: Parse and compare mathematical structures
This is critical for accurate benchmark evaluation, where naive string matching would undercount correct answers.
Usage
Used throughout FoT for evaluating predictions against ground truth. Called by the result logging step in benchmark evaluation and by the CGDM post-processing pipeline for accuracy reporting.
Theoretical Basis
The equivalence check implements a cascaded comparison strategy with decreasing strictness:
# Pseudo-code for answer equivalence cascade
def check(gt, predicted):
if normalize(gt) == normalize(predicted):
return True
if float(gt) == float(predicted):
return True
if sympy.simplify(gt - predicted) == 0:
return True
return False
Key normalization steps include:
- Removing LaTeX formatting (\\text{}, \\mathrm{}, etc.)
- Standardizing fractions, square roots, and operators
- Converting units and percentages to base form
- Handling multiple answer formats (boxed, inline, ####-delimited)