Implementation:OpenRLHF OpenRLHF Math reward func
| Knowledge Sources | |
|---|---|
| Domains | Reinforcement_Learning, Reward_Modeling |
| Last Updated | 2026-02-07 00:00 GMT |
Overview
Concrete tool for computing rule-based math rewards from generated solutions provided by OpenRLHF.
Description
The math_reward_func function (or equivalent in the math reward example script) extracts answers from model-generated solutions, compares them to ground truth labels, and returns binary rewards. It supports answer extraction from boxed LaTeX format and various number formats.
This is a Pattern Doc - users implement their own reward functions following this interface.
Usage
Used as the reward function in Math-GRPO training. Users define their own function matching this interface.
Code Reference
Source Location
- Repository: OpenRLHF
- File: examples/scripts/train_ppo_llama_ray_math.sh (reference)
Interface Specification
def math_reward_func(
queries: list[str], # Input prompts
responses: list[str], # Generated responses
labels: list[str], # Ground truth answers
) -> list[float]:
"""
Compute rewards for math problem solutions.
Args:
queries: List of math problem prompts
responses: List of model-generated solutions
labels: List of correct answers
Returns:
List of float rewards (typically 0.0 or 1.0)
"""
rewards = []
for response, label in zip(responses, labels):
answer = extract_answer(response) # Extract from \boxed{...} etc.
if answer == normalize(label):
rewards.append(1.0)
else:
rewards.append(0.0)
return rewards
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| queries | List[str] | Yes | Math problem prompts |
| responses | List[str] | Yes | Generated solutions |
| labels | List[str] | Yes | Ground truth answers |
Outputs
| Name | Type | Description |
|---|---|---|
| rewards | List[float] | Binary rewards (0.0 or 1.0) |
Usage Examples
# User-defined math reward function
def my_math_reward(queries, responses, labels):
import re
rewards = []
for resp, label in zip(responses, labels):
# Extract answer from \boxed{...}
match = re.search(r'\\boxed\{(.+?)\}', resp)
if match and match.group(1).strip() == label.strip():
rewards.append(1.0)
else:
rewards.append(0.0)
return rewards