Implementation:OpenRLHF OpenRLHF Math reward func

Knowledge Sources	OpenRLHF
Domains	Reinforcement_Learning, Reward_Modeling
Last Updated	2026-02-07 00:00 GMT

Overview

Concrete tool for computing rule-based math rewards from generated solutions provided by OpenRLHF.

Description

The math_reward_func function (or equivalent in the math reward example script) extracts answers from model-generated solutions, compares them to ground truth labels, and returns binary rewards. It supports answer extraction from boxed LaTeX format and various number formats.

This is a Pattern Doc - users implement their own reward functions following this interface.

Usage

Used as the reward function in Math-GRPO training. Users define their own function matching this interface.

Code Reference

Source Location

Repository: OpenRLHF
File: examples/scripts/train_ppo_llama_ray_math.sh (reference)

Interface Specification

def math_reward_func(
    queries: list[str],       # Input prompts
    responses: list[str],     # Generated responses
    labels: list[str],        # Ground truth answers
) -> list[float]:
    """
    Compute rewards for math problem solutions.

    Args:
        queries: List of math problem prompts
        responses: List of model-generated solutions
        labels: List of correct answers

    Returns:
        List of float rewards (typically 0.0 or 1.0)
    """
    rewards = []
    for response, label in zip(responses, labels):
        answer = extract_answer(response)  # Extract from \boxed{...} etc.
        if answer == normalize(label):
            rewards.append(1.0)
        else:
            rewards.append(0.0)
    return rewards

I/O Contract

Inputs

Name	Type	Required	Description
queries	List[str]	Yes	Math problem prompts
responses	List[str]	Yes	Generated solutions
labels	List[str]	Yes	Ground truth answers

Outputs

Name	Type	Description
rewards	List[float]	Binary rewards (0.0 or 1.0)

Usage Examples

# User-defined math reward function
def my_math_reward(queries, responses, labels):
    import re
    rewards = []
    for resp, label in zip(responses, labels):
        # Extract answer from \boxed{...}
        match = re.search(r'\\boxed\{(.+?)\}', resp)
        if match and match.group(1).strip() == label.strip():
            rewards.append(1.0)
        else:
            rewards.append(0.0)
    return rewards

Related Pages

Implements Principle

Principle:OpenRLHF_OpenRLHF_Rule_Based_Reward_Functions

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment