Implementation:OpenRLHF OpenRLHF RewardModelTrainer
| Knowledge Sources | |
|---|---|
| Domains | NLP, Reward_Modeling, Training |
| Last Updated | 2026-02-07 00:00 GMT |
Overview
Concrete tool for training reward models from preference data provided by OpenRLHF.
Description
The RewardModelTrainer class implements the reward model training loop with support for PairWiseLoss (sigmoid) and LogExpLoss variants. It concatenates chosen and rejected sequences for efficient single-forward-pass computation, tracks reward accuracy as chosen_reward > rejected_reward percentage, and records reward distribution statistics (mean, std) on the model config for inference normalization.
Usage
Instantiate with a reward model (from get_llm_for_sequence_regression), optimizer, preference dataloaders, and call fit() to train.
Code Reference
Source Location
- Repository: OpenRLHF
- File: openrlhf/trainer/rm_trainer.py
- Lines: L12-350 (class), L29-103 (__init__), L105-200 (fit)
Signature
class RewardModelTrainer(ABC):
def __init__(
self,
model, # nn.Module: reward model with value head
strategy, # DeepspeedStrategy
optim: Optimizer, # optimizer
train_dataloader, # training DataLoader (RewardDataset)
eval_dataloader, # evaluation DataLoader
scheduler, # learning rate scheduler
tokenizer, # tokenizer for padding
max_norm=0.5, # gradient clipping norm
max_epochs: int = 2, # training epochs
loss="sigmoid", # "sigmoid" (PairWiseLoss) or "logexp"
disable_ds_ckpt=False, # disable DeepSpeed checkpoints
save_hf_ckpt=False, # save HF format checkpoints
) -> None:
def fit(self, args, consumed_samples=0, num_update_steps_per_epoch=None):
"""Run the full training loop."""
Import
from openrlhf.trainer import RewardModelTrainer
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| model | nn.Module | Yes | Reward model from get_llm_for_sequence_regression |
| train_dataloader | DataLoader | Yes | Preference data (from RewardDataset is_dpo=False) |
| loss | str | No | Loss type: "sigmoid" or "logexp" (default "sigmoid") |
Outputs
| Name | Type | Description |
|---|---|---|
| (side effect) | None | Model weights updated in-place |
| model.config.mean | float | Mean reward for normalization |
| model.config.std | float | Std reward for normalization |
Usage Examples
from openrlhf.trainer import RewardModelTrainer
trainer = RewardModelTrainer(
model=reward_model,
strategy=strategy,
optim=optimizer,
train_dataloader=train_dataloader,
eval_dataloader=eval_dataloader,
scheduler=scheduler,
tokenizer=tokenizer,
max_norm=args.max_norm,
max_epochs=args.max_epochs,
loss=args.loss,
)
trainer.fit(args, num_update_steps_per_epoch=num_update_steps_per_epoch)