Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:OpenRLHF OpenRLHF ValueLoss

From Leeroopedia


Knowledge Sources
Domains Reinforcement_Learning, Loss_Functions
Last Updated 2026-02-07 00:00 GMT

Overview

Concrete tool for computing clipped value function losses for PPO critic training provided by OpenRLHF.

Description

The ValueLoss class computes the clipped squared error between predicted values and returns. When clip_eps is set, it clips the new values to be within clip_eps of the old values, then takes the maximum of clipped and unclipped losses. The result is multiplied by 0.5.

Usage

Instantiated by the PPO critic trainer. Called each training step with current and old value predictions and computed returns.

Code Reference

Source Location

  • Repository: OpenRLHF
  • File: openrlhf/models/loss.py
  • Lines: L185-215

Signature

class ValueLoss(nn.Module):
    def __init__(
        self,
        clip_eps: float = None,        # Value clip range (None = no clipping)
        token_level_loss: bool = True,  # Token vs sequence level
    ) -> None:

    def forward(
        self,
        values: torch.Tensor,          # Current value predictions
        old_values: torch.Tensor,      # Previous value predictions
        returns: torch.Tensor,         # Computed returns (from GAE)
        action_mask: Optional[torch.Tensor] = None,
    ) -> torch.Tensor:
        """Returns scalar value loss (0.5 * MSE)."""

Import

from openrlhf.models import ValueLoss

I/O Contract

Inputs

Name Type Required Description
values Tensor Yes Current critic value predictions
old_values Tensor Yes Previous value predictions (from rollout)
returns Tensor Yes Computed returns from GAE
action_mask Tensor No Binary mask for action tokens

Outputs

Name Type Description
loss Tensor Scalar value loss (0.5 * clipped MSE)

Usage Examples

from openrlhf.models import ValueLoss

value_loss_fn = ValueLoss(clip_eps=0.2)
v_loss = value_loss_fn(values, old_values, returns, action_mask)

Related Pages

Implements Principle

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment