Implementation:Intel Ipex llm AutoModelForCausalLM From Pretrained DPO

Knowledge Sources	IPEX-LLM
Domains	NLP, RLHF, Model_Loading
Last Updated	2026-02-09 00:00 GMT

Overview

Concrete tool for loading policy and reference models for DPO training on Intel XPU, provided by IPEX-LLM.

Description

For DPO training, AutoModelForCausalLM.from_pretrained is called twice: once for the policy model (with BitsAndBytesConfig + LoRA) and once for the reference model (with load_in_low_bit="nf4"). Both are moved to XPU. The policy model is further wrapped with prepare_model_for_kbit_training and get_peft_model using upstream peft.LoraConfig (not ipex_llm's LoraConfig, since DPO uses peft directly).

Usage

Use when setting up DPO training to load both the trainable policy model and frozen reference model.

Code Reference

Source Location

Repository: IPEX-LLM
File: python/llm/example/GPU/LLM-Finetuning/DPO/dpo_finetuning.py
Lines: 104-143

Signature

# Policy model (trainable)
bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_use_double_quant=False,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.bfloat16
)
model = AutoModelForCausalLM.from_pretrained(
    model_path: str,
    quantization_config: BitsAndBytesConfig = bnb_config,
) -> PreTrainedModel

# Reference model (frozen)
ref_model = AutoModelForCausalLM.from_pretrained(
    model_path: str,
    load_in_low_bit: str = "nf4",
    optimize_model: bool = False,
    torch_dtype = torch.bfloat16,
    modules_to_not_convert: List[str] = ["lm_head"],
) -> PreTrainedModel

Import

from ipex_llm.transformers import AutoModelForCausalLM
from ipex_llm.transformers.qlora import get_peft_model, prepare_model_for_kbit_training
from transformers import BitsAndBytesConfig
from peft import LoraConfig

I/O Contract

Inputs

Name	Type	Required	Description
model_path	str	Yes	HuggingFace model ID or local path
quantization_config	BitsAndBytesConfig	Yes (policy)	4-bit NF4 config for policy model
load_in_low_bit	str	Yes (ref)	"nf4" for reference model
peft_config	LoraConfig	Yes (policy)	LoRA configuration for policy model (from upstream peft)

Outputs

Name	Type	Description
model	PeftModel	Policy model with LoRA adapters on XPU
ref_model	PreTrainedModel	Frozen reference model on XPU

Usage Examples

import torch
from ipex_llm.transformers import AutoModelForCausalLM
from ipex_llm.transformers.qlora import get_peft_model, prepare_model_for_kbit_training
from transformers import BitsAndBytesConfig
from peft import LoraConfig

model_path = "teknium/OpenHermes-2.5-Mistral-7B"

# 1. Load policy model with 4-bit quantization
bnb_config = BitsAndBytesConfig(
    load_in_4bit=True, bnb_4bit_use_double_quant=False,
    bnb_4bit_quant_type="nf4", bnb_4bit_compute_dtype=torch.bfloat16
)
model = AutoModelForCausalLM.from_pretrained(model_path, quantization_config=bnb_config)
model = model.to('xpu')

# 2. Add LoRA adapters (using upstream peft LoraConfig)
peft_config = LoraConfig(
    r=16, lora_alpha=16, lora_dropout=0.05, bias="none",
    task_type="CAUSAL_LM",
    target_modules=['k_proj','gate_proj','v_proj','up_proj','q_proj','o_proj','down_proj']
)
model = prepare_model_for_kbit_training(model)
model = get_peft_model(model, peft_config)

# 3. Load reference model (frozen)
ref_model = AutoModelForCausalLM.from_pretrained(
    model_path, load_in_low_bit="nf4", optimize_model=False,
    torch_dtype=torch.bfloat16, modules_to_not_convert=["lm_head"]
)
ref_model = ref_model.to('xpu')

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment