Principle:OpenRLHF OpenRLHF PPO Value Loss
| Knowledge Sources | |
|---|---|
| Domains | Reinforcement_Learning, Loss_Functions |
| Last Updated | 2026-02-07 00:00 GMT |
Overview
A clipped value function loss that trains the critic model to predict expected returns while constraining updates relative to previous value estimates.
Description
PPO Value Loss trains the critic (value function) model by minimizing the squared error between predicted values and computed returns, optionally with value clipping to prevent large value function changes between updates. This ensures stable advantage estimation for policy optimization.
Usage
Used as the critic loss in PPO training. Paired with PolicyLoss for the actor.
Theoretical Basis
Clipped value loss:
where is the return (computed via GAE) and is the clip range.