Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Principle:OpenRLHF OpenRLHF Pairwise Ranking Loss

From Leeroopedia


Knowledge Sources
Domains Reward_Modeling, Loss_Functions
Last Updated 2026-02-07 00:00 GMT

Overview

A loss function based on the Bradley-Terry model that trains reward models to assign higher scores to human-preferred responses.

Description

Pairwise Ranking Loss implements the Bradley-Terry preference model for reward model training. Given a pair of responses where one is preferred (chosen) and one is not (rejected), the loss pushes the reward model to produce a higher scalar score for the chosen response. OpenRLHF provides two variants: PairWiseLoss (log-sigmoid) and LogExpLoss (log-exponential).

Usage

Used internally by RewardModelTrainer. Select "sigmoid" for standard PairWiseLoss or "logexp" for the LogExpLoss variant.

Theoretical Basis

PairWiseLoss (sigmoid): L=logσ(rchosenrrejectedm) where m is an optional margin.

LogExpLoss: L=log(1+exp(rrejectedrchosen))

Both losses are minimized when rchosen>rrejected.

Related Pages

Implemented By

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment