Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Principle:Eric mitchell Direct preference optimization Dropout Disabling

From Leeroopedia


Knowledge Sources
Domains Regularization, Training_Stability, Deep_Learning
Last Updated 2026-02-08 02:00 GMT

Overview

A training configuration technique that disables all dropout layers in a model to ensure deterministic and stable preference optimization.

Description

Dropout disabling sets all dropout probabilities to zero in a pre-trained model before DPO or SFT training. While dropout is useful during pre-training as a regularization technique, it introduces stochasticity that can be harmful during preference optimization:

  • In DPO, the loss compares log probabilities from the policy and reference models. Dropout noise could cause inconsistent probability estimates between the two models, destabilizing training.
  • For SFT fine-tuning on small preference datasets, the regularization benefit of dropout is minimal compared to the noise it introduces.
  • Disabling dropout ensures reproducible forward passes, which is important for the comparison between policy and reference model outputs.

Usage

Apply this technique immediately after loading any model that will be used in DPO or SFT training. Both the policy and reference models should have dropout disabled.

Theoretical Basis

Dropout randomly zeros elements of the input tensor with probability p during training, and scales the remaining elements by 11p. Setting p=0 makes the forward pass deterministic.

For DPO specifically, the loss depends on the difference of log-ratios between policy and reference models. Dropout noise on either side would add variance to the gradient estimates without providing useful regularization signal, since the models are already pre-trained and the fine-tuning dataset is relatively small.

Pseudo-code:

# Abstract algorithm (NOT actual implementation)
for each module in model:
    if module is Dropout:
        module.probability = 0

Related Pages

Implemented By

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment