Implementation:Allenai Open instruct DPO Loss Function
| Component Type | Function |
|---|---|
| Source | open_instruct/dpo_utils.py (Lines 608-649)
|
| Repository | Open Instruct |
| Dependencies | torch, torch.nn.functional |
| Last Updated | 2026-02-07 00:00 GMT |
Overview
Concrete tool for computing the standard Direct Preference Optimization loss from policy and reference model log-probabilities, provided by the Open Instruct library.
Description
dpo_loss() implements the core DPO loss function as described in "Direct Preference Optimization: Your Language Model is Secretly a Reward Model" (Rafailov et al., 2023). Given log-probabilities from both the policy model and reference model for chosen and rejected responses, it computes:
- Log-ratios: The difference between policy chosen and rejected log-probabilities (
pi_logratios), and similarly for the reference model (ref_logratios). - Logits: The difference
pi_logratios - ref_logratios, representing the relative preference of the policy over the reference. - Loss: Applies the sigmoid loss with optional label smoothing:
-logsigmoid(beta * logits) * (1 - label_smoothing) - logsigmoid(-beta * logits) * label_smoothing. - Implicit rewards: Computes detached reward metrics for monitoring:
beta * (policy_logps - reference_logps)for both chosen and rejected.
The reference_free option sets the reference log-ratios to zero, effectively using a uniform reference policy.
Usage
Import and call dpo_loss() when computing the standard DPO or DPO-norm loss within a training loop. For SimPO or WPO, use the dedicated simpo_loss() or wpo_loss() functions instead, or use the higher-level compute_loss() dispatcher.
Code Reference
Source Location
- Repository: Open Instruct
- File:
open_instruct/dpo_utils.py(Lines 608-649)
Signature
def dpo_loss(
policy_chosen_logps: torch.Tensor,
policy_rejected_logps: torch.Tensor,
reference_chosen_logps: torch.Tensor,
reference_rejected_logps: torch.Tensor,
beta: float,
reference_free: bool = False,
label_smoothing: float = 0.0,
) -> tuple[torch.Tensor, torch.Tensor, torch.Tensor]:
Import
from open_instruct.dpo_utils import dpo_loss
I/O Contract
Inputs
| Parameter | Type | Description |
|---|---|---|
policy_chosen_logps |
torch.Tensor |
Log-probabilities of the policy model for chosen responses. Shape: (batch_size,).
|
policy_rejected_logps |
torch.Tensor |
Log-probabilities of the policy model for rejected responses. Shape: (batch_size,).
|
reference_chosen_logps |
torch.Tensor |
Log-probabilities of the reference model for chosen responses. Shape: (batch_size,).
|
reference_rejected_logps |
torch.Tensor |
Log-probabilities of the reference model for rejected responses. Shape: (batch_size,).
|
beta |
float |
Temperature parameter, typically in the range 0.1 to 0.5. Higher values make the loss more sensitive to preference differences. |
reference_free |
bool |
If True, ignores the reference model and uses a uniform reference (sets reference log-ratios to 0). Default: False.
|
label_smoothing |
float |
Label smoothing parameter in [0, 1). Default: 0.0 (no smoothing).
|
Outputs
| Output | Type | Description |
|---|---|---|
losses |
torch.Tensor |
Per-example DPO losses. Shape: (batch_size,).
|
chosen_rewards |
torch.Tensor |
Implicit rewards for chosen responses (detached). Shape: (batch_size,).
|
rejected_rewards |
torch.Tensor |
Implicit rewards for rejected responses (detached). Shape: (batch_size,).
|
Usage Examples
import torch
from open_instruct.dpo_utils import dpo_loss
# Example with batch_size=4
policy_chosen = torch.tensor([-1.2, -0.8, -1.5, -0.9])
policy_rejected = torch.tensor([-2.1, -1.5, -1.8, -2.0])
ref_chosen = torch.tensor([-1.4, -1.0, -1.6, -1.1])
ref_rejected = torch.tensor([-1.9, -1.3, -1.7, -1.8])
losses, chosen_rewards, rejected_rewards = dpo_loss(
policy_chosen_logps=policy_chosen,
policy_rejected_logps=policy_rejected,
reference_chosen_logps=ref_chosen,
reference_rejected_logps=ref_rejected,
beta=0.1,
label_smoothing=0.0,
)
# losses: per-example DPO losses (batch_size,)
# chosen_rewards: implicit rewards for chosen (batch_size,)
# rejected_rewards: implicit rewards for rejected (batch_size,)
mean_loss = losses.mean()
mean_loss.backward()