Principle:OpenRLHF OpenRLHF Pairwise Ranking Loss
| Knowledge Sources | |
|---|---|
| Domains | Reward_Modeling, Loss_Functions |
| Last Updated | 2026-02-07 00:00 GMT |
Overview
A loss function based on the Bradley-Terry model that trains reward models to assign higher scores to human-preferred responses.
Description
Pairwise Ranking Loss implements the Bradley-Terry preference model for reward model training. Given a pair of responses where one is preferred (chosen) and one is not (rejected), the loss pushes the reward model to produce a higher scalar score for the chosen response. OpenRLHF provides two variants: PairWiseLoss (log-sigmoid) and LogExpLoss (log-exponential).
Usage
Used internally by RewardModelTrainer. Select "sigmoid" for standard PairWiseLoss or "logexp" for the LogExpLoss variant.
Theoretical Basis
PairWiseLoss (sigmoid): where is an optional margin.
LogExpLoss:
Both losses are minimized when .