Implementation:Huggingface Open r1 Get Reward Funcs

Metadata

Field	Value
Source	Repo (https://github.com/huggingface/open-r1)
Domains	Reinforcement_Learning, NLP
Last Updated	2026-02-08 00:00 GMT

Overview

Concrete tool for resolving and configuring reward functions from a named registry provided by Open-R1.

Description

The get_reward_funcs function uses the REWARD_FUNCS_REGISTRY dictionary to resolve reward function names (strings) to callables. The registry contains 14 reward functions:

Name	Type	Description
`accuracy`	Direct function	Mathematical correctness verification via symbolic parsing
`format`	Direct function	Checks proper use of think/answer XML tags
`reasoning_steps`	Direct function	Scores presence of step-by-step reasoning structure
`cosine`	Factory-generated	Cosine-scaled length reward; shorter correct answers score higher
`repetition_penalty`	Factory-generated	N-gram diversity penalty; penalizes repetitive outputs
`length`	Direct function	Raw length-based reward
`code`	Partial application	Execution-based code correctness scoring
`binary_code`	Partial application	Binary pass/fail code execution scoring
`ioi_code`	Partial application	IOI-style competitive programming code scoring
`cf_code`	Partial application	Codeforces-style competitive programming code scoring
`code_format`	Factory-generated	Checks code output formatting conventions
`tag_count`	Direct function	Counts and scores proper tag usage
`soft_overlong_punishment`	Factory-generated	Soft penalty for outputs exceeding length threshold (DAPO)

Factory-generated functions (cosine, repetition_penalty, code_format, soft_overlong_punishment) use parameters from GRPOScriptArguments to configure their behavior at initialization time. Reward functions accept completions: list[list[dict]] and optional kwargs from dataset columns, returning list[float|None]. A return value of None signals that the sample should be skipped.

Usage

Import when setting up GRPO training to resolve reward function names from config to callable objects. The function reads the reward_funcs list from script_args and returns a corresponding list of configured callables ready for the GRPO trainer.

Code Reference

Source

Field	Value
Repository	open-r1
File	src/open_r1/rewards.py
Lines	L646-706

Signature

def get_reward_funcs(script_args) -> list[Callable]:
    REWARD_FUNCS_REGISTRY = {
        "accuracy": accuracy_reward,
        "format": format_reward,
        "reasoning_steps": reasoning_steps_reward,
        "cosine": get_cosine_scaled_reward(...),
        "repetition_penalty": get_repetition_penalty_reward(...),
        "length": len_reward,
        "code": partial(code_reward, ...),
        "binary_code": partial(binary_code_reward, ...),
        "ioi_code": partial(ioi_code_reward, ...),
        "cf_code": partial(cf_code_reward, ...),
        "code_format": get_code_format_reward(...),
        "tag_count": tag_count_reward,
        "soft_overlong_punishment": get_soft_overlong_punishment(...),
    }
    reward_funcs = [REWARD_FUNCS_REGISTRY[func] for func in script_args.reward_funcs]
    return reward_funcs

Import

from open_r1.rewards import get_reward_funcs

I/O Contract

Inputs

Parameter	Type	Required	Description
`script_args`	GRPOScriptArguments	Yes	Training script arguments containing `reward_funcs` (list of strings naming reward functions) plus configuration parameters for factory functions: cosine scaling bounds, repetition penalty n-gram size and max penalty, code execution settings, overlong punishment thresholds

Outputs

Type	Description
`list[Callable]`	None].

Usage Examples

from dataclasses import dataclass, field

@dataclass
class GRPOScriptArguments:
    reward_funcs: list[str] = field(default_factory=lambda: ["accuracy", "format"])
    cosine_min_len_value_wrong: float = 0.0
    cosine_max_len_value_wrong: float = -0.5
    cosine_min_len_value_correct: float = 1.0
    cosine_max_len_value_correct: float = 0.5
    cosine_min_len: int = 50
    cosine_max_len: int = 4000
    repetition_n_grams: int = 3
    repetition_max_penalty: float = -1.0
    code_language: str = "python"
    soft_overlong_max_length: int = 4096
    soft_overlong_penalty_scale: float = 1.0

# Example 1: Basic accuracy + format reward setup
script_args = GRPOScriptArguments(reward_funcs=["accuracy", "format"])
reward_funcs = get_reward_funcs(script_args)
# reward_funcs is now [accuracy_reward, format_reward]

# Example 2: Full multi-reward configuration
script_args = GRPOScriptArguments(
    reward_funcs=["accuracy", "format", "cosine", "repetition_penalty", "soft_overlong_punishment"]
)
reward_funcs = get_reward_funcs(script_args)
# reward_funcs contains 5 configured callables

# Example 3: Using resolved reward functions
completions = [[{"role": "assistant", "content": "<think>Step 1...</think><answer>42</answer>"}]]
for reward_fn in reward_funcs:
    scores = reward_fn(completions=completions, solution=["42"])
    print(scores)  # e.g., [1.0], [1.0], [0.85], [0.0], [0.0]

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment