Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Volcengine Verl Extract Solution Regex

From Leeroopedia


Field Value
Knowledge Sources verl source code, data preprocessing examples
Domains Answer Extraction, Regex Parsing, Reward Computation
Last Updated 2026-02-07

Overview

Description

Answer extraction functions parse the ground-truth solution string from a dataset to isolate the final numeric or symbolic answer. These extracted answers serve as the ground_truth value in the reward_model configuration dict, which is later compared against the model's generated output during reward computation.

Two extraction strategies are implemented across the verl codebase:

  • GSM8K extraction -- Uses the regex pattern r"#### (\-?[0-9\.\,]+)" to find the final answer after the "#### " marker. The match is then cleaned by removing commas. This is the standard GSM8K answer format where chain-of-thought reasoning is followed by "#### {answer}".
  • MATH extraction -- Uses last_boxed_only_string(solution_str) to find the last \boxed{...} expression in the solution, then remove_boxed() to extract the content inside the braces. This handles the MATH dataset convention of enclosing final answers in LaTeX boxed notation.

Usage

These functions are called during data preprocessing to transform raw dataset answer strings into clean ground-truth values stored in parquet files. They are not called at training time; the extracted values are pre-computed and stored.

Code Reference

Field Value
GSM8K Source examples/data_preprocess/gsm8k.py, Lines 27-32
MATH Source examples/data_preprocess/math_dataset.py, Lines 28-29
GSM8K Signature def extract_solution(solution_str) -> str
MATH Signature def extract_solution(solution_str) -> str
MATH Import from verl.utils.reward_score.math_reward import last_boxed_only_string, remove_boxed

I/O Contract

Inputs

Parameter Type Description
solution_str str The raw answer/solution string from the dataset. For GSM8K, this is the full chain-of-thought answer ending with "#### {number}". For MATH, this is a LaTeX solution containing \boxed{answer}.

Outputs

Return Type Description
solution str The extracted final answer as a clean string. For GSM8K, a numeric string with commas removed. For MATH, the content inside the last \boxed{}.

Usage Examples

GSM8K answer extraction:

import re

def extract_solution(solution_str):
    """Extract the numeric answer after '#### ' from a GSM8K solution string."""
    solution = re.search("#### (\\-?[0-9\\.\\,]+)", solution_str)
    assert solution is not None
    final_solution = solution.group(0)
    final_solution = final_solution.split("#### ")[1].replace(",", "")
    return final_solution

# Example usage
raw_answer = (
    "First, calculate 15 * 4 = 60. "
    "Then add 10 to get 70. "
    "#### 70"
)
solution = extract_solution(raw_answer)
print(solution)  # "70"

# Handles negative numbers and commas
raw_answer_negative = "The net loss is #### -1,250"
solution = extract_solution(raw_answer_negative)
print(solution)  # "-1250"

MATH answer extraction:

from verl.utils.reward_score.math_reward import last_boxed_only_string, remove_boxed

def extract_solution(solution_str):
    """Extract the answer from the last \\boxed{} in a MATH solution string."""
    return remove_boxed(last_boxed_only_string(solution_str))

# Example usage
raw_solution = (
    "We need to find $x$ such that $x^2 = 16$. "
    "Therefore $x = \\pm 4$, but since $x > 0$, "
    "we have $\\boxed{4}$."
)
solution = extract_solution(raw_solution)
print(solution)  # "4"

Integration with data preprocessing:

# The extracted solution becomes the ground_truth in the reward config
answer_raw = example.pop("answer")
solution = extract_solution(answer_raw)

data = {
    "data_source": "openai/gsm8k",
    "prompt": [{"role": "user", "content": question}],
    "reward_model": {"style": "rule", "ground_truth": solution},
}

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment