Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Principle:Iamhankai Forest of Thought Answer Equivalence Checking

From Leeroopedia
Knowledge Sources
Domains Evaluation, Mathematics
Last Updated 2026-02-14 03:00 GMT

Overview

A multi-layer comparison strategy for determining whether a predicted mathematical answer is equivalent to the ground truth, handling diverse formats including numeric, symbolic, and LaTeX representations.

Description

Answer Equivalence Checking addresses the fundamental challenge of evaluating math reasoning: the same answer can be expressed in many different ways (e.g., "1/2", "0.5", "\\frac{1}{2}", "50%"). The pattern implements a cascade of increasingly sophisticated comparison methods:

  1. Direct string match: Simple string equality after normalization
  2. Numeric comparison: Float conversion with tolerance
  3. LaTeX parsing: Extract and compare boxed/formatted answers
  4. Symbolic equivalence: SymPy-based algebraic simplification and comparison
  5. Vector/set comparison: Parse and compare mathematical structures

This is critical for accurate benchmark evaluation, where naive string matching would undercount correct answers.

Usage

Used throughout FoT for evaluating predictions against ground truth. Called by the result logging step in benchmark evaluation and by the CGDM post-processing pipeline for accuracy reporting.

Theoretical Basis

The equivalence check implements a cascaded comparison strategy with decreasing strictness:

# Pseudo-code for answer equivalence cascade
def check(gt, predicted):
    if normalize(gt) == normalize(predicted):
        return True
    if float(gt) == float(predicted):
        return True
    if sympy.simplify(gt - predicted) == 0:
        return True
    return False

Key normalization steps include:

  • Removing LaTeX formatting (\\text{}, \\mathrm{}, etc.)
  • Standardizing fractions, square roots, and operators
  • Converting units and percentages to base form
  • Handling multiple answer formats (boxed, inline, ####-delimited)

Related Pages

Implemented By

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment