Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:OpenRLHF OpenRLHF PairWiseLoss

From Leeroopedia


Knowledge Sources
Domains Reward_Modeling, Loss_Functions
Last Updated 2026-02-07 00:00 GMT

Overview

Concrete tool for computing pairwise ranking losses for reward model training provided by OpenRLHF.

Description

The PairWiseLoss class implements the Bradley-Terry log-sigmoid loss with optional margin. It computes logσ(rchosenrrejectedmargin) and returns the batch mean. The companion LogExpLoss class provides the log-exponential variant.

Usage

Instantiated by RewardModelTrainer based on the loss parameter. Not typically used directly.

Code Reference

Source Location

  • Repository: OpenRLHF
  • File: openrlhf/models/loss.py
  • Lines: L218-243 (PairWiseLoss L218-231, LogExpLoss L233-243)

Signature

class PairWiseLoss(nn.Module):
    def forward(
        self,
        chosen_reward: torch.Tensor,     # Scalar rewards for chosen responses
        reject_reward: torch.Tensor,     # Scalar rewards for rejected responses
        margin: torch.Tensor = None,     # Optional margin per pair
    ) -> torch.Tensor:

class LogExpLoss(nn.Module):
    def forward(
        self,
        chosen_reward: torch.Tensor,
        reject_reward: torch.Tensor,
        margin: torch.Tensor = None,
    ) -> torch.Tensor:

Import

from openrlhf.models import PairWiseLoss, LogExpLoss

I/O Contract

Inputs

Name Type Required Description
chosen_reward Tensor Yes Reward scores for preferred responses (batch_size,)
reject_reward Tensor Yes Reward scores for rejected responses (batch_size,)
margin Tensor No Per-pair margin values

Outputs

Name Type Description
loss Tensor Scalar mean loss value

Usage Examples

from openrlhf.models import PairWiseLoss

loss_fn = PairWiseLoss()
loss = loss_fn(chosen_rewards, rejected_rewards, margin=None)

Related Pages

Implements Principle

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment