Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Heuristic:Allenai Open instruct Disable Dropout In RL

From Leeroopedia



Knowledge Sources
Domains Reinforcement_Learning, Optimization
Last Updated 2026-02-07 00:00 GMT

Overview

Disable all dropout layers (set p=0) during reinforcement learning training to prevent noise that undermines the reward signal.

Description

In standard supervised learning, dropout acts as a regularizer. However, in on-policy RL (GRPO), dropout introduces stochastic noise that interferes with the reward signal and advantage estimation. Since the policy gradient already has high variance from the reward signal, adding dropout noise makes training less stable. The codebase explicitly sets all dropout modules to p=0 for both the policy and reference models.

Usage

Apply this heuristic for all on-policy RL training (GRPO, PPO). Not typically applied for SFT or DPO, where dropout regularization can still be beneficial.

The Insight (Rule of Thumb)

  • Action: Call `disable_dropout_in_model(model)` on both policy and reference models before training.
  • Value: Set `module.p = 0` for all `torch.nn.Dropout` instances.
  • Trade-off: Loss of dropout regularization; mitigated by the KL penalty serving as implicit regularization in GRPO.

Reasoning

Policy gradient methods estimate the gradient from reward signals, which already have high variance. Dropout adds additional noise to the forward pass, making the gradient estimates even noisier. In GRPO specifically, the reference policy KL penalty already prevents overfitting (serving a similar role to dropout), so dropout becomes redundant and harmful.

Code Evidence

Dropout disabling utility from `open_instruct/model_utils.py:181-184`:

def disable_dropout_in_model(model: torch.nn.Module) -> None:
    for module in model.modules():
        if isinstance(module, torch.nn.Dropout):
            module.p = 0

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment