Environment:SqueezeAILab ETS Evaluation Python Stack
| Knowledge Sources | |
|---|---|
| Domains | Infrastructure, Evaluation |
| Last Updated | 2026-02-14 02:30 GMT |
Overview
CPU-based Python environment with SymPy, pylatexenc, and regex for math answer evaluation and grading.
Description
This environment provides the runtime context for the ETS evaluation pipeline. Unlike the tree search environment, it does not require GPUs. It uses SymPy for symbolic math expression simplification and equivalence checking, pylatexenc for LaTeX-to-text conversion, and the `regex` library (not the standard `re`) for advanced pattern matching in answer extraction. The evaluation code reads JSON output files from the search phase and produces accuracy metrics.
Usage
Use this environment for all answer evaluation tasks: extracting answers from model output, normalizing mathematical expressions, grading answers against ground truth, and computing accuracy via best-of-N or majority voting. It is the mandatory prerequisite for the Score_Aggregation_Functions, Extract_Answer, Grade_Answer, and Evaluate_And_Majority_Vote implementations.
System Requirements
| Category | Requirement | Notes |
|---|---|---|
| OS | Linux/macOS/Windows | No GPU or OS-specific requirements |
| Hardware | CPU only | No GPU needed for evaluation |
| Disk | Minimal | Only reads JSON result files and writes text accuracy reports |
Dependencies
Python Packages
- `sympy` (symbolic math simplification in `grader.py`)
- `pylatexenc` (LaTeX-to-text parsing in `grader.py`)
- `regex` (advanced regex for answer extraction in `answer_extraction.py`, `process_utils.py`)
- `tqdm` (progress display)
- Standard library: `json`, `re`, `os`, `sys`, `argparse`, `collections`
Credentials
No credentials are required. The evaluation pipeline operates entirely on local JSON files.
Quick Install
pip install sympy pylatexenc regex tqdm
Code Evidence
SymPy usage for expression equivalence from `grader.py:8-10`:
import sympy
from pylatexenc import latex2text
from sympy.parsing import sympy_parser
SymPy simplification in `grader.py:202-213`:
def are_equal_under_sympy(ground_truth_normalized: str, given_normalized: str):
are_equal = False
try:
expr = f"({ground_truth_normalized})-({given_normalized})"
if should_allow_eval(expr):
sympy_diff = _sympy_parse(expr)
simplified = sympy.simplify(sympy_diff)
if simplified == 0:
are_equal = True
except:
pass
return are_equal
Advanced regex library usage from `answer_extraction.py:1-2`:
import re
import regex
Common Errors
| Error Message | Cause | Solution |
|---|---|---|
| `ModuleNotFoundError: No module named 'sympy'` | SymPy not installed | `pip install sympy` |
| `ModuleNotFoundError: No module named 'pylatexenc'` | pylatexenc not installed | `pip install pylatexenc` |
| `ModuleNotFoundError: No module named 'regex'` | regex library not installed (different from `re`) | `pip install regex` |
| `NameError: name 'extract_func' is not defined` | `model_type` arg not set to `mistral_7b` or `llemma` | Pass `--model_type llemma` or `--model_type mistral_7b` to `math_evaluate.py` |
Compatibility Notes
- SymPy hang guard: The `grader.py` code filters out expressions with `BAD_SUBSTRINGS` (`^{`, `^(`) and `BAD_REGEXES` before attempting SymPy simplification to prevent infinite hangs on complex expressions.
- SymPy equality disabled: The `are_equal_under_sympy` function is defined but commented out at `grader.py:284`. The current grading logic falls through to `is_correct = False` for non-trivial cases. This means the grader is intentionally strict.
- regex vs re: The `answer_extraction.py` file uses both `re` (standard library) and `regex` (third-party). The `regex` library provides additional features used for Unicode and nested pattern matching.