Implementation:Intel Ipex llm AutoModelForCausalLM From Pretrained bf16

Knowledge Sources	IPEX-LLM HuggingFace Transformers
Domains	NLP, Model_Loading
Last Updated	2026-02-09 00:00 GMT

Overview

Concrete tool for loading language models in bfloat16 precision for LoRA fine-tuning on Intel XPU, provided by IPEX-LLM.

Description

The AutoModelForCausalLM.from_pretrained from ipex_llm.transformers with load_in_low_bit="bf16" loads the model in bfloat16 precision without quantization. The optimize_model=False flag disables inference-only XPU optimizations that would interfere with training. The modules_to_not_convert=["lm_head"] parameter excludes the language model head from any low-bit conversion.

Usage

Use this when loading a base model for standard LoRA fine-tuning (not QLoRA) on Intel GPUs where sufficient memory is available for bf16 precision.

Code Reference

Source Location

Repository: IPEX-LLM
File: python/llm/example/GPU/LLM-Finetuning/LoRA/alpaca_lora_finetuning.py
Lines: 161-178

Signature

model = AutoModelForCausalLM.from_pretrained(
    model_id: str,
    load_in_low_bit: str = "bf16",
    optimize_model: bool = False,
    torch_dtype = torch.bfloat16,
    modules_to_not_convert: List[str] = ["lm_head"],
    trust_remote_code: bool = True
) -> PreTrainedModel

Import

from ipex_llm.transformers import AutoModelForCausalLM

I/O Contract

Inputs

Name	Type	Required	Description
model_id	str	Yes	HuggingFace model ID or local path
load_in_low_bit	str	Yes	Set to "bf16" for bfloat16 precision
optimize_model	bool	Yes	Must be False for training
torch_dtype	torch.dtype	No	Compute dtype (torch.bfloat16)
modules_to_not_convert	List[str]	No	Layers to exclude from conversion (default ["lm_head"])
trust_remote_code	bool	No	Allow custom model code

Outputs

Name	Type	Description
model	PreTrainedModel	bf16-precision model ready for LoRA adapter injection

Usage Examples

import torch
from ipex_llm.transformers import AutoModelForCausalLM
from transformers import AutoTokenizer
import os

# Load model in bf16 precision
model = AutoModelForCausalLM.from_pretrained(
    "meta-llama/Llama-2-7b-hf",
    load_in_low_bit="bf16",
    optimize_model=False,
    torch_dtype=torch.bfloat16,
    modules_to_not_convert=["lm_head"],
    trust_remote_code=True,
)

# Move to XPU device
model = model.to(f'xpu:{os.environ.get("LOCAL_RANK", 0)}')

# Load tokenizer
tokenizer = AutoTokenizer.from_pretrained("meta-llama/Llama-2-7b-hf")
if tokenizer.pad_token is None:
    tokenizer.pad_token = tokenizer.eos_token

Related Pages

Implements Principle

Principle:Intel_Ipex_llm_LoRA_Model_Loading_bf16

Requires Environment

Environment:Intel_Ipex_llm_XPU_Finetuning_Environment

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment