Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Heuristic:Princeton nlp SimPO Dropout Disabling

From Leeroopedia



Knowledge Sources
Domains Optimization, Deep_Learning, Preference_Optimization
Last Updated 2026-02-08 05:00 GMT

Overview

Disable all dropout layers during SimPO preference optimization training to ensure consistent log probability computation between chosen and rejected responses.

Description

SimPO (and DPO-family algorithms) compare log probabilities of chosen vs rejected responses. Dropout introduces stochasticity in the forward pass, meaning the same input can produce different outputs across passes. Since SimPO concatenates chosen and rejected inputs into a single forward pass, dropout could still cause inconsistent gradient signals across the batch dimension. The SimPOConfig sets `disable_dropout = True` by default, and the trainer calls `disable_dropout_in_model(model)` from TRL utilities to recursively set all dropout modules to `p=0`.

Usage

Use this heuristic as a default setting for SimPO training. Only consider re-enabling dropout if you have specific regularization needs and understand the impact on preference loss stability.

The Insight (Rule of Thumb)

  • Action: Set `disable_dropout: true` in SimPOConfig (this is the default).
  • Value: All dropout modules in the model are set to `p=0.0`.
  • Trade-off: Reduced regularization from dropout, but this is acceptable for fine-tuning pre-trained models on preference data where only 1 epoch is typically used.

Reasoning

Preference optimization algorithms compute a relative reward signal: the model should assign higher probability to chosen responses than rejected ones. If dropout introduces noise into these probability estimates, the gradient signal becomes noisy, potentially slowing convergence or destabilizing training. Disabling dropout ensures deterministic forward passes (aside from data-parallel communication), making the preference signal clean. Since SimPO training typically runs for only 1 epoch on curated preference data, the regularization benefit of dropout is minimal compared to the cost of noisy preference signals.

Code evidence from `scripts/simpo_config.py:60`:

disable_dropout: bool = True

Code evidence from `scripts/simpo_trainer.py:241-242`:

if args.disable_dropout:
    disable_dropout_in_model(model)

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment