Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Environment:SqueezeAILab ETS Evaluation Python Stack

From Leeroopedia
Knowledge Sources
Domains Infrastructure, Evaluation
Last Updated 2026-02-14 02:30 GMT

Overview

CPU-based Python environment with SymPy, pylatexenc, and regex for math answer evaluation and grading.

Description

This environment provides the runtime context for the ETS evaluation pipeline. Unlike the tree search environment, it does not require GPUs. It uses SymPy for symbolic math expression simplification and equivalence checking, pylatexenc for LaTeX-to-text conversion, and the `regex` library (not the standard `re`) for advanced pattern matching in answer extraction. The evaluation code reads JSON output files from the search phase and produces accuracy metrics.

Usage

Use this environment for all answer evaluation tasks: extracting answers from model output, normalizing mathematical expressions, grading answers against ground truth, and computing accuracy via best-of-N or majority voting. It is the mandatory prerequisite for the Score_Aggregation_Functions, Extract_Answer, Grade_Answer, and Evaluate_And_Majority_Vote implementations.

System Requirements

Category Requirement Notes
OS Linux/macOS/Windows No GPU or OS-specific requirements
Hardware CPU only No GPU needed for evaluation
Disk Minimal Only reads JSON result files and writes text accuracy reports

Dependencies

Python Packages

  • `sympy` (symbolic math simplification in `grader.py`)
  • `pylatexenc` (LaTeX-to-text parsing in `grader.py`)
  • `regex` (advanced regex for answer extraction in `answer_extraction.py`, `process_utils.py`)
  • `tqdm` (progress display)
  • Standard library: `json`, `re`, `os`, `sys`, `argparse`, `collections`

Credentials

No credentials are required. The evaluation pipeline operates entirely on local JSON files.

Quick Install

pip install sympy pylatexenc regex tqdm

Code Evidence

SymPy usage for expression equivalence from `grader.py:8-10`:

import sympy
from pylatexenc import latex2text
from sympy.parsing import sympy_parser

SymPy simplification in `grader.py:202-213`:

def are_equal_under_sympy(ground_truth_normalized: str, given_normalized: str):
    are_equal = False
    try:
        expr = f"({ground_truth_normalized})-({given_normalized})"
        if should_allow_eval(expr):
            sympy_diff = _sympy_parse(expr)
            simplified = sympy.simplify(sympy_diff)
            if simplified == 0:
                are_equal = True
    except:
        pass
    return are_equal

Advanced regex library usage from `answer_extraction.py:1-2`:

import re
import regex

Common Errors

Error Message Cause Solution
`ModuleNotFoundError: No module named 'sympy'` SymPy not installed `pip install sympy`
`ModuleNotFoundError: No module named 'pylatexenc'` pylatexenc not installed `pip install pylatexenc`
`ModuleNotFoundError: No module named 'regex'` regex library not installed (different from `re`) `pip install regex`
`NameError: name 'extract_func' is not defined` `model_type` arg not set to `mistral_7b` or `llemma` Pass `--model_type llemma` or `--model_type mistral_7b` to `math_evaluate.py`

Compatibility Notes

  • SymPy hang guard: The `grader.py` code filters out expressions with `BAD_SUBSTRINGS` (`^{`, `^(`) and `BAD_REGEXES` before attempting SymPy simplification to prevent infinite hangs on complex expressions.
  • SymPy equality disabled: The `are_equal_under_sympy` function is defined but commented out at `grader.py:284`. The current grading logic falls through to `is_correct = False` for non-trivial cases. This means the grader is intentionally strict.
  • regex vs re: The `answer_extraction.py` file uses both `re` (standard library) and `regex` (third-party). The `regex` library provides additional features used for Unicode and nested pattern matching.

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment