Implementation:Iamhankai Forest of Thought Extract Label
| Knowledge Sources | |
|---|---|
| Domains | NLP, Evaluation |
| Last Updated | 2026-02-14 03:00 GMT |
Overview
Concrete tool for extracting answer labels from LLM output text provided by the Forest-of-Thought repository.
Description
The extract_label function extracts a clean answer label from raw LLM output. For non-GSM datasets, it looks for \\boxed{} content or falls back to plain text extraction. For GSM8K, it extracts the number after #### or The answer is. The function uses regex patterns to find and clean the answer, handling LaTeX formatting and multi-format outputs.
Usage
Called extensively throughout the codebase: in monte_carlo_tree() for per-iteration answer tracking, in process_answer() for CGDM post-processing, and in check() for answer comparison preprocessing.
Code Reference
Source Location
- Repository: Forest-of-Thought
- File: utils/utils.py
- Lines: L199-225
Signature
def extract_label(DATA_NAME, text: str, type='') -> str:
"""
Extract answer label from model output text.
Args:
DATA_NAME (str): Dataset identifier (gsm8k, math, aime).
text (str): Raw model output containing the answer.
type (str): Answer type hint ('digit', 'option', 'yesorno',
'formula'). Default: ''.
Returns:
str: Extracted answer label, or None if not found.
"""
Import
from utils.utils import extract_label
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| DATA_NAME | str | Yes | Dataset name for format-aware extraction |
| text | str | Yes | Raw LLM output text |
| type | str | No | Answer type hint for targeted extraction |
Outputs
| Name | Type | Description |
|---|---|---|
| label | str or None | Extracted answer label, cleaned of formatting |
Usage Examples
from utils.utils import extract_label
# MATH format with boxed answer
label = extract_label("math", "The solution is \\boxed{42}.")
# label = "42"
# GSM8K format with ####
label = extract_label("gsm8k", "Step 1: ... Step 2: ... #### 120")
# label = "120"
# Fallback to last number
label = extract_label("math", "The answer is 7 because...")
# label = "7"