Implementation:Axolotl ai cloud Axolotl Setup Reference Model

Knowledge Sources	Axolotl TRL DPO
Domains	Alignment, Model_Loading
Last Updated	2026-02-06 23:00 GMT

Overview

Concrete tool for setting up the reference model for DPO alignment training provided by the Axolotl framework.

Description

The setup_reference_model function determines whether a separate reference model is needed for DPO training. For ORPO training, no reference is needed. For LoRA/adapter training, TRL auto-unwraps the base model unless cfg.rl_adapter_ref_model is explicitly set to force a separate copy. When a separate model is needed, it loads a full copy using ModelLoader with reference_model=True.

Usage

Called within the train function when cfg.rl is set. Returns the reference model or None.

Code Reference

Source Location

Repository: axolotl
File: src/axolotl/train.py
Lines: L109-135

Signature

def setup_reference_model(
    cfg: DictDefault,
    tokenizer: PreTrainedTokenizer,
) -> PreTrainedModel | None:
    """Set up the reference model for DPO/alignment training.

    Args:
        cfg: Configuration with rl type, adapter settings, rl_adapter_ref_model flag.
        tokenizer: Tokenizer for model loading.

    Returns:
        PreTrainedModel if a separate reference model is needed, None otherwise.
    """

Import

from axolotl.train import setup_reference_model

I/O Contract

Inputs

Name	Type	Required	Description
cfg	DictDefault	Yes	Config with rl (RL type), adapter (adapter type), rl_adapter_ref_model (force separate ref model)
tokenizer	PreTrainedTokenizer	Yes	Tokenizer for loading the reference model

Outputs

Name	Type	Description
return	PreTrainedModel or None	Separate reference model, or None if TRL auto-unwrap is used

Usage Examples

DPO with Auto-Unwrap (LoRA)

# cfg.rl = "dpo"
# cfg.adapter = "lora"
# cfg.rl_adapter_ref_model = False (default)
ref_model = setup_reference_model(cfg, tokenizer)
print(ref_model)  # None - TRL will auto-unwrap base model

DPO with Separate Reference

# cfg.rl = "dpo"
# cfg.adapter = None  # Full fine-tuning DPO
ref_model = setup_reference_model(cfg, tokenizer)
print(type(ref_model))  # PreTrainedModel - full separate copy loaded

Related Pages

Implements Principle

Principle:Axolotl_ai_cloud_Axolotl_Reference_Model_Setup

Requires Environment

Environment:Axolotl_ai_cloud_Axolotl_CUDA_GPU

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment