Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:Axolotl ai cloud Axolotl Setup Reference Model

From Leeroopedia


Knowledge Sources
Domains Alignment, Model_Loading
Last Updated 2026-02-06 23:00 GMT

Overview

Concrete tool for setting up the reference model for DPO alignment training provided by the Axolotl framework.

Description

The setup_reference_model function determines whether a separate reference model is needed for DPO training. For ORPO training, no reference is needed. For LoRA/adapter training, TRL auto-unwraps the base model unless cfg.rl_adapter_ref_model is explicitly set to force a separate copy. When a separate model is needed, it loads a full copy using ModelLoader with reference_model=True.

Usage

Called within the train function when cfg.rl is set. Returns the reference model or None.

Code Reference

Source Location

  • Repository: axolotl
  • File: src/axolotl/train.py
  • Lines: L109-135

Signature

def setup_reference_model(
    cfg: DictDefault,
    tokenizer: PreTrainedTokenizer,
) -> PreTrainedModel | None:
    """Set up the reference model for DPO/alignment training.

    Args:
        cfg: Configuration with rl type, adapter settings, rl_adapter_ref_model flag.
        tokenizer: Tokenizer for model loading.

    Returns:
        PreTrainedModel if a separate reference model is needed, None otherwise.
    """

Import

from axolotl.train import setup_reference_model

I/O Contract

Inputs

Name Type Required Description
cfg DictDefault Yes Config with rl (RL type), adapter (adapter type), rl_adapter_ref_model (force separate ref model)
tokenizer PreTrainedTokenizer Yes Tokenizer for loading the reference model

Outputs

Name Type Description
return PreTrainedModel or None Separate reference model, or None if TRL auto-unwrap is used

Usage Examples

DPO with Auto-Unwrap (LoRA)

# cfg.rl = "dpo"
# cfg.adapter = "lora"
# cfg.rl_adapter_ref_model = False (default)
ref_model = setup_reference_model(cfg, tokenizer)
print(ref_model)  # None - TRL will auto-unwrap base model

DPO with Separate Reference

# cfg.rl = "dpo"
# cfg.adapter = None  # Full fine-tuning DPO
ref_model = setup_reference_model(cfg, tokenizer)
print(type(ref_model))  # PreTrainedModel - full separate copy loaded

Related Pages

Implements Principle

Requires Environment

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment