Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:LLMBook zh LLMBook zh github io AutoModelForCausalLM From Pretrained DPO

From Leeroopedia


Knowledge Sources
Domains NLP, Alignment
Last Updated 2026-02-08 00:00 GMT

Overview

Concrete tool for loading policy and frozen reference models for DPO training provided by HuggingFace Transformers.

Description

For DPO, AutoModelForCausalLM.from_pretrained is called twice: once for the trainable policy model and once for the frozen reference model. The reference model is explicitly set to eval mode and all parameters have requires_grad=False.

This is a Wrapper Doc documenting how the LLMBook repository uses AutoModelForCausalLM in the DPO context.

Usage

Load both models before creating the DPOTrainer.

Code Reference

Source Location

  • Repository: LLMBook-zh
  • File: code/8.2 DPO实践.py
  • Lines: 58-64

Signature

# Policy model (trainable)
model = AutoModelForCausalLM.from_pretrained(model_name_or_path: str)

# Reference model (frozen)
model_ref = AutoModelForCausalLM.from_pretrained(model_name_or_path: str)
model_ref.eval()
for param in model_ref.parameters():
    param.requires_grad = False

Import

from transformers import AutoModelForCausalLM

I/O Contract

Inputs

Name Type Required Description
model_name_or_path str Yes HuggingFace model ID (e.g., "yulan-team/YuLan-Chat-12B-v3")

Outputs

Name Type Description
model PreTrainedModel Trainable policy model
model_ref PreTrainedModel Frozen reference model (eval mode, no grad)

Usage Examples

from transformers import AutoModelForCausalLM

model_name = "yulan-team/YuLan-Chat-12B-v3"

# Load policy model
model = AutoModelForCausalLM.from_pretrained(model_name)

# Load frozen reference model
model_ref = AutoModelForCausalLM.from_pretrained(model_name)
model_ref.eval()
for param in model_ref.parameters():
    param.requires_grad = False

Related Pages

Implements Principle

Requires Environment

Uses Heuristic

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment