Implementation:Hpcaitech ColossalAI RLVRRewardModel
Appearance
| Knowledge Sources | |
|---|---|
| Domains | Reinforcement_Learning, NLP |
| Last Updated | 2026-02-09 00:00 GMT |
Overview
Concrete tool for computing verifiable rewards from multiple reward functions, provided by ColossalChat.
Description
RLVRRewardModel wraps a list of callable reward functions, applying each to generated responses and aggregating their scores. The VerifiableReward class in the distributed RL pipeline provides a similar interface with support for gt_answer-based and test_cases-based reward functions.
Usage
Create with a list of reward functions (e.g., math_reward, code_reward) and call with generated responses and ground-truth answers.
Code Reference
Source Location
- Repository: ColossalAI
- File (RLVRRewardModel): applications/ColossalChat/coati/models/rlvr_reward_model.py
- Lines: 10-50
- File (VerifiableReward): applications/ColossalChat/coati/distributed/reward/verifiable_reward.py
- Lines: 11-71
Signature
class RLVRRewardModel:
def __init__(self, reward_fn_list: List[Callable], **kwargs) -> None:
"""
Args:
reward_fn_list: List of reward functions
**kwargs: Additional keyword args for reward functions
"""
def __call__(
self,
input_ids: torch.LongTensor,
attention_mask: Optional[torch.Tensor] = None,
response_start: List = None,
response_end: List = None,
gt_answer: List = None,
) -> torch.Tensor:
"""Compute rewards for each sample using all reward functions."""
class VerifiableReward:
def __init__(self, reward_fns: List[callable], **kwargs):
"""Distributed version with support for gt_answer and test_cases."""
def __call__(
self,
input_ids: torch.LongTensor,
gt_answer: List[str] = None,
test_cases: List[str] = None,
response_idx: List[torch.Tensor] = None,
) -> torch.Tensor:
"""Returns tensor of shape (batch_size, 3) with reward scores."""
Import
from coati.models.rlvr_reward_model import RLVRRewardModel
from coati.distributed.reward.verifiable_reward import VerifiableReward
from coati.distributed.reward.reward_fn import math_reward, code_reward
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| reward_fn_list | List[Callable] | Yes | List of reward functions (math_reward, code_reward, etc.) |
| input_ids | torch.LongTensor | Yes | Tokenized responses to evaluate |
| gt_answer | List[str] | No | Ground-truth answers for verification |
| test_cases | List[str] | No | Code test cases for execution-based verification |
Outputs
| Name | Type | Description |
|---|---|---|
| rewards | torch.Tensor | Reward scores per sample (shape: [batch_size] or [batch_size, num_fns]) |
Usage Examples
from coati.distributed.reward.verifiable_reward import VerifiableReward
from coati.distributed.reward.reward_fn import math_reward
# Create verifiable reward with math checking
reward_model = VerifiableReward(
reward_fns=[math_reward],
)
# Score generated responses
rewards = reward_model(
input_ids=generated_ids,
gt_answer=ground_truth_answers,
response_idx=response_indices,
)
Related Pages
Implements Principle
Environment and Heuristic Links
Page Connections
Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment