Implementation:OpenRLHF OpenRLHF ValueLoss
Appearance
| Knowledge Sources | |
|---|---|
| Domains | Reinforcement_Learning, Loss_Functions |
| Last Updated | 2026-02-07 00:00 GMT |
Overview
Concrete tool for computing clipped value function losses for PPO critic training provided by OpenRLHF.
Description
The ValueLoss class computes the clipped squared error between predicted values and returns. When clip_eps is set, it clips the new values to be within clip_eps of the old values, then takes the maximum of clipped and unclipped losses. The result is multiplied by 0.5.
Usage
Instantiated by the PPO critic trainer. Called each training step with current and old value predictions and computed returns.
Code Reference
Source Location
- Repository: OpenRLHF
- File: openrlhf/models/loss.py
- Lines: L185-215
Signature
class ValueLoss(nn.Module):
def __init__(
self,
clip_eps: float = None, # Value clip range (None = no clipping)
token_level_loss: bool = True, # Token vs sequence level
) -> None:
def forward(
self,
values: torch.Tensor, # Current value predictions
old_values: torch.Tensor, # Previous value predictions
returns: torch.Tensor, # Computed returns (from GAE)
action_mask: Optional[torch.Tensor] = None,
) -> torch.Tensor:
"""Returns scalar value loss (0.5 * MSE)."""
Import
from openrlhf.models import ValueLoss
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| values | Tensor | Yes | Current critic value predictions |
| old_values | Tensor | Yes | Previous value predictions (from rollout) |
| returns | Tensor | Yes | Computed returns from GAE |
| action_mask | Tensor | No | Binary mask for action tokens |
Outputs
| Name | Type | Description |
|---|---|---|
| loss | Tensor | Scalar value loss (0.5 * clipped MSE) |
Usage Examples
from openrlhf.models import ValueLoss
value_loss_fn = ValueLoss(clip_eps=0.2)
v_loss = value_loss_fn(values, old_values, returns, action_mask)
Related Pages
Implements Principle
Page Connections
Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment