Implementation:Volcengine Verl Extract Solution Regex

Field	Value
Knowledge Sources	verl source code, data preprocessing examples
Domains	Answer Extraction, Regex Parsing, Reward Computation
Last Updated	2026-02-07

Overview

Description

Answer extraction functions parse the ground-truth solution string from a dataset to isolate the final numeric or symbolic answer. These extracted answers serve as the ground_truth value in the reward_model configuration dict, which is later compared against the model's generated output during reward computation.

Two extraction strategies are implemented across the verl codebase:

GSM8K extraction -- Uses the regex pattern r"#### (\-?[0-9\.\,]+)" to find the final answer after the "#### " marker. The match is then cleaned by removing commas. This is the standard GSM8K answer format where chain-of-thought reasoning is followed by "#### {answer}".

MATH extraction -- Uses last_boxed_only_string(solution_str) to find the last \boxed{...} expression in the solution, then remove_boxed() to extract the content inside the braces. This handles the MATH dataset convention of enclosing final answers in LaTeX boxed notation.

Usage

These functions are called during data preprocessing to transform raw dataset answer strings into clean ground-truth values stored in parquet files. They are not called at training time; the extracted values are pre-computed and stored.

Code Reference

Field	Value
GSM8K Source	`examples/data_preprocess/gsm8k.py`, Lines 27-32
MATH Source	`examples/data_preprocess/math_dataset.py`, Lines 28-29
GSM8K Signature	`def extract_solution(solution_str) -> str`
MATH Signature	`def extract_solution(solution_str) -> str`
MATH Import	`from verl.utils.reward_score.math_reward import last_boxed_only_string, remove_boxed`

I/O Contract

Inputs

Parameter	Type	Description
`solution_str`	`str`	The raw answer/solution string from the dataset. For GSM8K, this is the full chain-of-thought answer ending with `"#### {number}"`. For MATH, this is a LaTeX solution containing `\boxed{answer}`.

Outputs

Return	Type	Description
solution	`str`	The extracted final answer as a clean string. For GSM8K, a numeric string with commas removed. For MATH, the content inside the last `\boxed{}`.

Usage Examples

GSM8K answer extraction:

import re

def extract_solution(solution_str):
    """Extract the numeric answer after '#### ' from a GSM8K solution string."""
    solution = re.search("#### (\\-?[0-9\\.\\,]+)", solution_str)
    assert solution is not None
    final_solution = solution.group(0)
    final_solution = final_solution.split("#### ")[1].replace(",", "")
    return final_solution

# Example usage
raw_answer = (
    "First, calculate 15 * 4 = 60. "
    "Then add 10 to get 70. "
    "#### 70"
)
solution = extract_solution(raw_answer)
print(solution)  # "70"

# Handles negative numbers and commas
raw_answer_negative = "The net loss is #### -1,250"
solution = extract_solution(raw_answer_negative)
print(solution)  # "-1250"

MATH answer extraction:

from verl.utils.reward_score.math_reward import last_boxed_only_string, remove_boxed

def extract_solution(solution_str):
    """Extract the answer from the last \\boxed{} in a MATH solution string."""
    return remove_boxed(last_boxed_only_string(solution_str))

# Example usage
raw_solution = (
    "We need to find $x$ such that $x^2 = 16$. "
    "Therefore $x = \\pm 4$, but since $x > 0$, "
    "we have $\\boxed{4}$."
)
solution = extract_solution(raw_solution)
print(solution)  # "4"

Integration with data preprocessing:

# The extracted solution becomes the ground_truth in the reward config
answer_raw = example.pop("answer")
solution = extract_solution(answer_raw)

data = {
    "data_source": "openai/gsm8k",
    "prompt": [{"role": "user", "content": question}],
    "reward_model": {"style": "rule", "ground_truth": solution},
}

Related Pages

Principle:Volcengine_Verl_Answer_Extraction

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment