Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Intel Ipex llm AutoModelForCausalLM From Pretrained QLoRA

From Leeroopedia


Knowledge Sources
Domains NLP, Model_Quantization
Last Updated 2026-02-09 00:00 GMT

Overview

Concrete tool for loading language models with 4-bit NF4 quantization for QLoRA fine-tuning on Intel XPU, provided by IPEX-LLM.

Description

The AutoModelForCausalLM.from_pretrained from ipex_llm.transformers is a drop-in replacement for HuggingFace's AutoModelForCausalLM that supports Intel XPU-optimized quantization. For QLoRA, it accepts a BitsAndBytesConfig with NF4 settings. The model is loaded in 4-bit precision with bfloat16 compute dtype, ready for LoRA adapter injection.

Usage

Use this when loading a base model for QLoRA fine-tuning on Intel GPUs. The BitsAndBytesConfig interface is compatible with the standard HuggingFace API but optimized for Intel XPU.

Code Reference

Source Location

  • Repository: IPEX-LLM
  • File: python/llm/example/GPU/LLM-Finetuning/QLoRA/alpaca-qlora/alpaca_qlora_finetuning.py
  • Lines: 177-196

Signature

# BitsAndBytesConfig for NF4 quantization
bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_use_double_quant=False,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.bfloat16
)

# Load model with quantization
model = AutoModelForCausalLM.from_pretrained(
    model_id: str,
    torch_dtype=torch.bfloat16,
    quantization_config: BitsAndBytesConfig = None,
    trust_remote_code: bool = True
) -> PreTrainedModel

Import

from transformers import BitsAndBytesConfig
from ipex_llm.transformers import AutoModelForCausalLM

I/O Contract

Inputs

Name Type Required Description
model_id str Yes HuggingFace model ID or local path (e.g., "meta-llama/Llama-2-7b-hf")
quantization_config BitsAndBytesConfig Yes 4-bit NF4 quantization configuration
torch_dtype torch.dtype No Compute dtype (default torch.bfloat16)
trust_remote_code bool No Allow custom model code from HuggingFace Hub

Outputs

Name Type Description
model PreTrainedModel 4-bit NF4 quantized model ready for LoRA adapter injection

Usage Examples

import torch
from transformers import BitsAndBytesConfig, AutoTokenizer
from ipex_llm.transformers import AutoModelForCausalLM

# 1. Configure NF4 quantization
bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_use_double_quant=False,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.bfloat16
)

# 2. Load model with 4-bit quantization
model = AutoModelForCausalLM.from_pretrained(
    "meta-llama/Llama-2-7b-hf",
    torch_dtype=torch.bfloat16,
    quantization_config=bnb_config,
    trust_remote_code=True
)

# 3. Load tokenizer
tokenizer = AutoTokenizer.from_pretrained("meta-llama/Llama-2-7b-hf")
if tokenizer.pad_token is None:
    tokenizer.pad_token = tokenizer.eos_token

Related Pages

Implements Principle

Requires Environment

Uses Heuristic

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment