Implementation:Intel Ipex llm AutoModelForCausalLM From Pretrained DPO
| Knowledge Sources | |
|---|---|
| Domains | NLP, RLHF, Model_Loading |
| Last Updated | 2026-02-09 00:00 GMT |
Overview
Concrete tool for loading policy and reference models for DPO training on Intel XPU, provided by IPEX-LLM.
Description
For DPO training, AutoModelForCausalLM.from_pretrained is called twice: once for the policy model (with BitsAndBytesConfig + LoRA) and once for the reference model (with load_in_low_bit="nf4"). Both are moved to XPU. The policy model is further wrapped with prepare_model_for_kbit_training and get_peft_model using upstream peft.LoraConfig (not ipex_llm's LoraConfig, since DPO uses peft directly).
Usage
Use when setting up DPO training to load both the trainable policy model and frozen reference model.
Code Reference
Source Location
- Repository: IPEX-LLM
- File: python/llm/example/GPU/LLM-Finetuning/DPO/dpo_finetuning.py
- Lines: 104-143
Signature
# Policy model (trainable)
bnb_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_use_double_quant=False,
bnb_4bit_quant_type="nf4",
bnb_4bit_compute_dtype=torch.bfloat16
)
model = AutoModelForCausalLM.from_pretrained(
model_path: str,
quantization_config: BitsAndBytesConfig = bnb_config,
) -> PreTrainedModel
# Reference model (frozen)
ref_model = AutoModelForCausalLM.from_pretrained(
model_path: str,
load_in_low_bit: str = "nf4",
optimize_model: bool = False,
torch_dtype = torch.bfloat16,
modules_to_not_convert: List[str] = ["lm_head"],
) -> PreTrainedModel
Import
from ipex_llm.transformers import AutoModelForCausalLM
from ipex_llm.transformers.qlora import get_peft_model, prepare_model_for_kbit_training
from transformers import BitsAndBytesConfig
from peft import LoraConfig
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| model_path | str | Yes | HuggingFace model ID or local path |
| quantization_config | BitsAndBytesConfig | Yes (policy) | 4-bit NF4 config for policy model |
| load_in_low_bit | str | Yes (ref) | "nf4" for reference model |
| peft_config | LoraConfig | Yes (policy) | LoRA configuration for policy model (from upstream peft) |
Outputs
| Name | Type | Description |
|---|---|---|
| model | PeftModel | Policy model with LoRA adapters on XPU |
| ref_model | PreTrainedModel | Frozen reference model on XPU |
Usage Examples
import torch
from ipex_llm.transformers import AutoModelForCausalLM
from ipex_llm.transformers.qlora import get_peft_model, prepare_model_for_kbit_training
from transformers import BitsAndBytesConfig
from peft import LoraConfig
model_path = "teknium/OpenHermes-2.5-Mistral-7B"
# 1. Load policy model with 4-bit quantization
bnb_config = BitsAndBytesConfig(
load_in_4bit=True, bnb_4bit_use_double_quant=False,
bnb_4bit_quant_type="nf4", bnb_4bit_compute_dtype=torch.bfloat16
)
model = AutoModelForCausalLM.from_pretrained(model_path, quantization_config=bnb_config)
model = model.to('xpu')
# 2. Add LoRA adapters (using upstream peft LoraConfig)
peft_config = LoraConfig(
r=16, lora_alpha=16, lora_dropout=0.05, bias="none",
task_type="CAUSAL_LM",
target_modules=['k_proj','gate_proj','v_proj','up_proj','q_proj','o_proj','down_proj']
)
model = prepare_model_for_kbit_training(model)
model = get_peft_model(model, peft_config)
# 3. Load reference model (frozen)
ref_model = AutoModelForCausalLM.from_pretrained(
model_path, load_in_low_bit="nf4", optimize_model=False,
torch_dtype=torch.bfloat16, modules_to_not_convert=["lm_head"]
)
ref_model = ref_model.to('xpu')