Implementation:LLMBook zh LLMBook zh github io AutoModelForCausalLM From Pretrained DPO

Knowledge Sources	LLMBook-zh HuggingFace AutoModelForCausalLM
Domains	NLP, Alignment
Last Updated	2026-02-08 00:00 GMT

Overview

Concrete tool for loading policy and frozen reference models for DPO training provided by HuggingFace Transformers.

Description

For DPO, AutoModelForCausalLM.from_pretrained is called twice: once for the trainable policy model and once for the frozen reference model. The reference model is explicitly set to eval mode and all parameters have requires_grad=False.

This is a Wrapper Doc documenting how the LLMBook repository uses AutoModelForCausalLM in the DPO context.

Usage

Load both models before creating the DPOTrainer.

Code Reference

Source Location

Repository: LLMBook-zh
File: code/8.2 DPO实践.py
Lines: 58-64

Signature

# Policy model (trainable)
model = AutoModelForCausalLM.from_pretrained(model_name_or_path: str)

# Reference model (frozen)
model_ref = AutoModelForCausalLM.from_pretrained(model_name_or_path: str)
model_ref.eval()
for param in model_ref.parameters():
    param.requires_grad = False

Import

from transformers import AutoModelForCausalLM

I/O Contract

Inputs

Name	Type	Required	Description
model_name_or_path	str	Yes	HuggingFace model ID (e.g., "yulan-team/YuLan-Chat-12B-v3")

Outputs

Name	Type	Description
model	PreTrainedModel	Trainable policy model
model_ref	PreTrainedModel	Frozen reference model (eval mode, no grad)

Usage Examples

from transformers import AutoModelForCausalLM

model_name = "yulan-team/YuLan-Chat-12B-v3"

# Load policy model
model = AutoModelForCausalLM.from_pretrained(model_name)

# Load frozen reference model
model_ref = AutoModelForCausalLM.from_pretrained(model_name)
model_ref.eval()
for param in model_ref.parameters():
    param.requires_grad = False

Related Pages

Implements Principle

Principle:LLMBook_zh_LLMBook_zh_github_io_DPO_Model_Loading

Requires Environment

Uses Heuristic

Heuristic:LLMBook_zh_LLMBook_zh_github_io_DPO_Beta_Hyperparameter

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment