Implementation:Volcengine Verl Extract Solution Regex
| Field | Value |
|---|---|
| Knowledge Sources | verl source code, data preprocessing examples |
| Domains | Answer Extraction, Regex Parsing, Reward Computation |
| Last Updated | 2026-02-07 |
Overview
Description
Answer extraction functions parse the ground-truth solution string from a dataset to isolate the final numeric or symbolic answer. These extracted answers serve as the ground_truth value in the reward_model configuration dict, which is later compared against the model's generated output during reward computation.
Two extraction strategies are implemented across the verl codebase:
- GSM8K extraction -- Uses the regex pattern
r"#### (\-?[0-9\.\,]+)"to find the final answer after the"#### "marker. The match is then cleaned by removing commas. This is the standard GSM8K answer format where chain-of-thought reasoning is followed by"#### {answer}".
- MATH extraction -- Uses
last_boxed_only_string(solution_str)to find the last\boxed{...}expression in the solution, thenremove_boxed()to extract the content inside the braces. This handles the MATH dataset convention of enclosing final answers in LaTeX boxed notation.
Usage
These functions are called during data preprocessing to transform raw dataset answer strings into clean ground-truth values stored in parquet files. They are not called at training time; the extracted values are pre-computed and stored.
Code Reference
| Field | Value |
|---|---|
| GSM8K Source | examples/data_preprocess/gsm8k.py, Lines 27-32
|
| MATH Source | examples/data_preprocess/math_dataset.py, Lines 28-29
|
| GSM8K Signature | def extract_solution(solution_str) -> str
|
| MATH Signature | def extract_solution(solution_str) -> str
|
| MATH Import | from verl.utils.reward_score.math_reward import last_boxed_only_string, remove_boxed
|
I/O Contract
Inputs
| Parameter | Type | Description |
|---|---|---|
solution_str |
str |
The raw answer/solution string from the dataset. For GSM8K, this is the full chain-of-thought answer ending with "#### {number}". For MATH, this is a LaTeX solution containing \boxed{answer}.
|
Outputs
| Return | Type | Description |
|---|---|---|
| solution | str |
The extracted final answer as a clean string. For GSM8K, a numeric string with commas removed. For MATH, the content inside the last \boxed{}.
|
Usage Examples
GSM8K answer extraction:
import re
def extract_solution(solution_str):
"""Extract the numeric answer after '#### ' from a GSM8K solution string."""
solution = re.search("#### (\\-?[0-9\\.\\,]+)", solution_str)
assert solution is not None
final_solution = solution.group(0)
final_solution = final_solution.split("#### ")[1].replace(",", "")
return final_solution
# Example usage
raw_answer = (
"First, calculate 15 * 4 = 60. "
"Then add 10 to get 70. "
"#### 70"
)
solution = extract_solution(raw_answer)
print(solution) # "70"
# Handles negative numbers and commas
raw_answer_negative = "The net loss is #### -1,250"
solution = extract_solution(raw_answer_negative)
print(solution) # "-1250"
MATH answer extraction:
from verl.utils.reward_score.math_reward import last_boxed_only_string, remove_boxed
def extract_solution(solution_str):
"""Extract the answer from the last \\boxed{} in a MATH solution string."""
return remove_boxed(last_boxed_only_string(solution_str))
# Example usage
raw_solution = (
"We need to find $x$ such that $x^2 = 16$. "
"Therefore $x = \\pm 4$, but since $x > 0$, "
"we have $\\boxed{4}$."
)
solution = extract_solution(raw_solution)
print(solution) # "4"
Integration with data preprocessing:
# The extracted solution becomes the ground_truth in the reward config
answer_raw = example.pop("answer")
solution = extract_solution(answer_raw)
data = {
"data_source": "openai/gsm8k",
"prompt": [{"role": "user", "content": question}],
"reward_model": {"style": "rule", "ground_truth": solution},
}