Implementation:OpenRLHF OpenRLHF RewardModelTrainer

Knowledge Sources	OpenRLHF
Domains	NLP, Reward_Modeling, Training
Last Updated	2026-02-07 00:00 GMT

Overview

Concrete tool for training reward models from preference data provided by OpenRLHF.

Description

The RewardModelTrainer class implements the reward model training loop with support for PairWiseLoss (sigmoid) and LogExpLoss variants. It concatenates chosen and rejected sequences for efficient single-forward-pass computation, tracks reward accuracy as chosen_reward > rejected_reward percentage, and records reward distribution statistics (mean, std) on the model config for inference normalization.

Usage

Instantiate with a reward model (from get_llm_for_sequence_regression), optimizer, preference dataloaders, and call fit() to train.

Code Reference

Source Location

Repository: OpenRLHF
File: openrlhf/trainer/rm_trainer.py
Lines: L12-350 (class), L29-103 (__init__), L105-200 (fit)

Signature

class RewardModelTrainer(ABC):
    def __init__(
        self,
        model,                       # nn.Module: reward model with value head
        strategy,                    # DeepspeedStrategy
        optim: Optimizer,            # optimizer
        train_dataloader,            # training DataLoader (RewardDataset)
        eval_dataloader,             # evaluation DataLoader
        scheduler,                   # learning rate scheduler
        tokenizer,                   # tokenizer for padding
        max_norm=0.5,                # gradient clipping norm
        max_epochs: int = 2,         # training epochs
        loss="sigmoid",              # "sigmoid" (PairWiseLoss) or "logexp"
        disable_ds_ckpt=False,       # disable DeepSpeed checkpoints
        save_hf_ckpt=False,          # save HF format checkpoints
    ) -> None:

    def fit(self, args, consumed_samples=0, num_update_steps_per_epoch=None):
        """Run the full training loop."""

Import

from openrlhf.trainer import RewardModelTrainer

I/O Contract

Inputs

Name	Type	Required	Description
model	nn.Module	Yes	Reward model from get_llm_for_sequence_regression
train_dataloader	DataLoader	Yes	Preference data (from RewardDataset is_dpo=False)
loss	str	No	Loss type: "sigmoid" or "logexp" (default "sigmoid")

Outputs

Name	Type	Description
(side effect)	None	Model weights updated in-place
model.config.mean	float	Mean reward for normalization
model.config.std	float	Std reward for normalization

Usage Examples

from openrlhf.trainer import RewardModelTrainer

trainer = RewardModelTrainer(
    model=reward_model,
    strategy=strategy,
    optim=optimizer,
    train_dataloader=train_dataloader,
    eval_dataloader=eval_dataloader,
    scheduler=scheduler,
    tokenizer=tokenizer,
    max_norm=args.max_norm,
    max_epochs=args.max_epochs,
    loss=args.loss,
)

trainer.fit(args, num_update_steps_per_epoch=num_update_steps_per_epoch)

Related Pages

Implements Principle

Principle:OpenRLHF_OpenRLHF_Reward_Model_Training_Loop

Requires Environment

Environment:OpenRLHF_OpenRLHF_CUDA_GPU_Environment

Uses Heuristic

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment