Principle:Eric mitchell Direct preference optimization Dropout Disabling

Knowledge Sources	Direct Preference Optimization Dropout Regularization
Domains	Regularization, Training_Stability, Deep_Learning
Last Updated	2026-02-08 02:00 GMT

Overview

A training configuration technique that disables all dropout layers in a model to ensure deterministic and stable preference optimization.

Description

Dropout disabling sets all dropout probabilities to zero in a pre-trained model before DPO or SFT training. While dropout is useful during pre-training as a regularization technique, it introduces stochasticity that can be harmful during preference optimization:

In DPO, the loss compares log probabilities from the policy and reference models. Dropout noise could cause inconsistent probability estimates between the two models, destabilizing training.
For SFT fine-tuning on small preference datasets, the regularization benefit of dropout is minimal compared to the noise it introduces.
Disabling dropout ensures reproducible forward passes, which is important for the comparison between policy and reference model outputs.

Usage

Apply this technique immediately after loading any model that will be used in DPO or SFT training. Both the policy and reference models should have dropout disabled.

Theoretical Basis

Dropout randomly zeros elements of the input tensor with probability $p$ during training, and scales the remaining elements by $\frac{1}{1 - p}$ . Setting $p = 0$ makes the forward pass deterministic.

For DPO specifically, the loss depends on the difference of log-ratios between policy and reference models. Dropout noise on either side would add variance to the gradient estimates without providing useful regularization signal, since the models are already pre-trained and the fine-tuning dataset is relatively small.

Pseudo-code:

# Abstract algorithm (NOT actual implementation)
for each module in model:
    if module is Dropout:
        module.probability = 0

Related Pages

Implemented By

Implementation:Eric_mitchell_Direct_preference_optimization_Disable_Dropout

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment