Implementation:Axolotl ai cloud Axolotl Setup Reference Model
Appearance
| Knowledge Sources | |
|---|---|
| Domains | Alignment, Model_Loading |
| Last Updated | 2026-02-06 23:00 GMT |
Overview
Concrete tool for setting up the reference model for DPO alignment training provided by the Axolotl framework.
Description
The setup_reference_model function determines whether a separate reference model is needed for DPO training. For ORPO training, no reference is needed. For LoRA/adapter training, TRL auto-unwraps the base model unless cfg.rl_adapter_ref_model is explicitly set to force a separate copy. When a separate model is needed, it loads a full copy using ModelLoader with reference_model=True.
Usage
Called within the train function when cfg.rl is set. Returns the reference model or None.
Code Reference
Source Location
- Repository: axolotl
- File: src/axolotl/train.py
- Lines: L109-135
Signature
def setup_reference_model(
cfg: DictDefault,
tokenizer: PreTrainedTokenizer,
) -> PreTrainedModel | None:
"""Set up the reference model for DPO/alignment training.
Args:
cfg: Configuration with rl type, adapter settings, rl_adapter_ref_model flag.
tokenizer: Tokenizer for model loading.
Returns:
PreTrainedModel if a separate reference model is needed, None otherwise.
"""
Import
from axolotl.train import setup_reference_model
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| cfg | DictDefault | Yes | Config with rl (RL type), adapter (adapter type), rl_adapter_ref_model (force separate ref model) |
| tokenizer | PreTrainedTokenizer | Yes | Tokenizer for loading the reference model |
Outputs
| Name | Type | Description |
|---|---|---|
| return | PreTrainedModel or None | Separate reference model, or None if TRL auto-unwrap is used |
Usage Examples
DPO with Auto-Unwrap (LoRA)
# cfg.rl = "dpo"
# cfg.adapter = "lora"
# cfg.rl_adapter_ref_model = False (default)
ref_model = setup_reference_model(cfg, tokenizer)
print(ref_model) # None - TRL will auto-unwrap base model
DPO with Separate Reference
# cfg.rl = "dpo"
# cfg.adapter = None # Full fine-tuning DPO
ref_model = setup_reference_model(cfg, tokenizer)
print(type(ref_model)) # PreTrainedModel - full separate copy loaded
Related Pages
Implements Principle
Requires Environment
Page Connections
Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment