Implementation:Huggingface Open r1 Get Reward Funcs
Metadata
| Field | Value |
|---|---|
| Source | Repo (https://github.com/huggingface/open-r1) |
| Domains | Reinforcement_Learning, NLP |
| Last Updated | 2026-02-08 00:00 GMT |
Overview
Concrete tool for resolving and configuring reward functions from a named registry provided by Open-R1.
Description
The get_reward_funcs function uses the REWARD_FUNCS_REGISTRY dictionary to resolve reward function names (strings) to callables. The registry contains 14 reward functions:
| Name | Type | Description |
|---|---|---|
accuracy |
Direct function | Mathematical correctness verification via symbolic parsing |
format |
Direct function | Checks proper use of think/answer XML tags |
reasoning_steps |
Direct function | Scores presence of step-by-step reasoning structure |
cosine |
Factory-generated | Cosine-scaled length reward; shorter correct answers score higher |
repetition_penalty |
Factory-generated | N-gram diversity penalty; penalizes repetitive outputs |
length |
Direct function | Raw length-based reward |
code |
Partial application | Execution-based code correctness scoring |
binary_code |
Partial application | Binary pass/fail code execution scoring |
ioi_code |
Partial application | IOI-style competitive programming code scoring |
cf_code |
Partial application | Codeforces-style competitive programming code scoring |
code_format |
Factory-generated | Checks code output formatting conventions |
tag_count |
Direct function | Counts and scores proper tag usage |
soft_overlong_punishment |
Factory-generated | Soft penalty for outputs exceeding length threshold (DAPO) |
Factory-generated functions (cosine, repetition_penalty, code_format, soft_overlong_punishment) use parameters from GRPOScriptArguments to configure their behavior at initialization time. Reward functions accept completions: list[list[dict]] and optional kwargs from dataset columns, returning list[float|None]. A return value of None signals that the sample should be skipped.
Usage
Import when setting up GRPO training to resolve reward function names from config to callable objects. The function reads the reward_funcs list from script_args and returns a corresponding list of configured callables ready for the GRPO trainer.
Code Reference
Source
| Field | Value |
|---|---|
| Repository | open-r1 |
| File | src/open_r1/rewards.py |
| Lines | L646-706 |
Signature
def get_reward_funcs(script_args) -> list[Callable]:
REWARD_FUNCS_REGISTRY = {
"accuracy": accuracy_reward,
"format": format_reward,
"reasoning_steps": reasoning_steps_reward,
"cosine": get_cosine_scaled_reward(...),
"repetition_penalty": get_repetition_penalty_reward(...),
"length": len_reward,
"code": partial(code_reward, ...),
"binary_code": partial(binary_code_reward, ...),
"ioi_code": partial(ioi_code_reward, ...),
"cf_code": partial(cf_code_reward, ...),
"code_format": get_code_format_reward(...),
"tag_count": tag_count_reward,
"soft_overlong_punishment": get_soft_overlong_punishment(...),
}
reward_funcs = [REWARD_FUNCS_REGISTRY[func] for func in script_args.reward_funcs]
return reward_funcs
Import
from open_r1.rewards import get_reward_funcs
I/O Contract
Inputs
| Parameter | Type | Required | Description |
|---|---|---|---|
script_args |
GRPOScriptArguments | Yes | Training script arguments containing reward_funcs (list of strings naming reward functions) plus configuration parameters for factory functions: cosine scaling bounds, repetition penalty n-gram size and max penalty, code execution settings, overlong punishment thresholds
|
Outputs
| Type | Description |
|---|---|
list[Callable] |
None]. |
Usage Examples
from dataclasses import dataclass, field
@dataclass
class GRPOScriptArguments:
reward_funcs: list[str] = field(default_factory=lambda: ["accuracy", "format"])
cosine_min_len_value_wrong: float = 0.0
cosine_max_len_value_wrong: float = -0.5
cosine_min_len_value_correct: float = 1.0
cosine_max_len_value_correct: float = 0.5
cosine_min_len: int = 50
cosine_max_len: int = 4000
repetition_n_grams: int = 3
repetition_max_penalty: float = -1.0
code_language: str = "python"
soft_overlong_max_length: int = 4096
soft_overlong_penalty_scale: float = 1.0
# Example 1: Basic accuracy + format reward setup
script_args = GRPOScriptArguments(reward_funcs=["accuracy", "format"])
reward_funcs = get_reward_funcs(script_args)
# reward_funcs is now [accuracy_reward, format_reward]
# Example 2: Full multi-reward configuration
script_args = GRPOScriptArguments(
reward_funcs=["accuracy", "format", "cosine", "repetition_penalty", "soft_overlong_punishment"]
)
reward_funcs = get_reward_funcs(script_args)
# reward_funcs contains 5 configured callables
# Example 3: Using resolved reward functions
completions = [[{"role": "assistant", "content": "<think>Step 1...</think><answer>42</answer>"}]]
for reward_fn in reward_funcs:
scores = reward_fn(completions=completions, solution=["42"])
print(scores) # e.g., [1.0], [1.0], [0.85], [0.0], [0.0]