Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Intel Ipex llm AutoModelForCausalLM From Pretrained bf16

From Leeroopedia


Knowledge Sources
Domains NLP, Model_Loading
Last Updated 2026-02-09 00:00 GMT

Overview

Concrete tool for loading language models in bfloat16 precision for LoRA fine-tuning on Intel XPU, provided by IPEX-LLM.

Description

The AutoModelForCausalLM.from_pretrained from ipex_llm.transformers with load_in_low_bit="bf16" loads the model in bfloat16 precision without quantization. The optimize_model=False flag disables inference-only XPU optimizations that would interfere with training. The modules_to_not_convert=["lm_head"] parameter excludes the language model head from any low-bit conversion.

Usage

Use this when loading a base model for standard LoRA fine-tuning (not QLoRA) on Intel GPUs where sufficient memory is available for bf16 precision.

Code Reference

Source Location

  • Repository: IPEX-LLM
  • File: python/llm/example/GPU/LLM-Finetuning/LoRA/alpaca_lora_finetuning.py
  • Lines: 161-178

Signature

model = AutoModelForCausalLM.from_pretrained(
    model_id: str,
    load_in_low_bit: str = "bf16",
    optimize_model: bool = False,
    torch_dtype = torch.bfloat16,
    modules_to_not_convert: List[str] = ["lm_head"],
    trust_remote_code: bool = True
) -> PreTrainedModel

Import

from ipex_llm.transformers import AutoModelForCausalLM

I/O Contract

Inputs

Name Type Required Description
model_id str Yes HuggingFace model ID or local path
load_in_low_bit str Yes Set to "bf16" for bfloat16 precision
optimize_model bool Yes Must be False for training
torch_dtype torch.dtype No Compute dtype (torch.bfloat16)
modules_to_not_convert List[str] No Layers to exclude from conversion (default ["lm_head"])
trust_remote_code bool No Allow custom model code

Outputs

Name Type Description
model PreTrainedModel bf16-precision model ready for LoRA adapter injection

Usage Examples

import torch
from ipex_llm.transformers import AutoModelForCausalLM
from transformers import AutoTokenizer
import os

# Load model in bf16 precision
model = AutoModelForCausalLM.from_pretrained(
    "meta-llama/Llama-2-7b-hf",
    load_in_low_bit="bf16",
    optimize_model=False,
    torch_dtype=torch.bfloat16,
    modules_to_not_convert=["lm_head"],
    trust_remote_code=True,
)

# Move to XPU device
model = model.to(f'xpu:{os.environ.get("LOCAL_RANK", 0)}')

# Load tokenizer
tokenizer = AutoTokenizer.from_pretrained("meta-llama/Llama-2-7b-hf")
if tokenizer.pad_token is None:
    tokenizer.pad_token = tokenizer.eos_token

Related Pages

Implements Principle

Requires Environment

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment