Heuristic:Princeton nlp SimPO Dropout Disabling
| Knowledge Sources | |
|---|---|
| Domains | Optimization, Deep_Learning, Preference_Optimization |
| Last Updated | 2026-02-08 05:00 GMT |
Overview
Disable all dropout layers during SimPO preference optimization training to ensure consistent log probability computation between chosen and rejected responses.
Description
SimPO (and DPO-family algorithms) compare log probabilities of chosen vs rejected responses. Dropout introduces stochasticity in the forward pass, meaning the same input can produce different outputs across passes. Since SimPO concatenates chosen and rejected inputs into a single forward pass, dropout could still cause inconsistent gradient signals across the batch dimension. The SimPOConfig sets `disable_dropout = True` by default, and the trainer calls `disable_dropout_in_model(model)` from TRL utilities to recursively set all dropout modules to `p=0`.
Usage
Use this heuristic as a default setting for SimPO training. Only consider re-enabling dropout if you have specific regularization needs and understand the impact on preference loss stability.
The Insight (Rule of Thumb)
- Action: Set `disable_dropout: true` in SimPOConfig (this is the default).
- Value: All dropout modules in the model are set to `p=0.0`.
- Trade-off: Reduced regularization from dropout, but this is acceptable for fine-tuning pre-trained models on preference data where only 1 epoch is typically used.
Reasoning
Preference optimization algorithms compute a relative reward signal: the model should assign higher probability to chosen responses than rejected ones. If dropout introduces noise into these probability estimates, the gradient signal becomes noisy, potentially slowing convergence or destabilizing training. Disabling dropout ensures deterministic forward passes (aside from data-parallel communication), making the preference signal clean. Since SimPO training typically runs for only 1 epoch on curated preference data, the regularization benefit of dropout is minimal compared to the cost of noisy preference signals.
Code evidence from `scripts/simpo_config.py:60`:
disable_dropout: bool = True
Code evidence from `scripts/simpo_trainer.py:241-242`:
if args.disable_dropout:
disable_dropout_in_model(model)