Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Intel Ipex llm Lora Finetune CPU

From Leeroopedia


Knowledge Sources
Domains Finetuning, LoRA, CPU
Last Updated 2026-02-09 04:00 GMT

Overview

Concrete tool for LoRA-based fine-tuning of LLaMA models on CPU provided by the IPEX-LLM Docker examples.

Description

The train() function implements a complete LoRA fine-tuning pipeline for causal language models on CPU. It loads a pre-trained model, applies LoRA adapters via PEFT, tokenizes an instruction-following dataset (Alpaca format), and trains using the HuggingFace Trainer API. Supports distributed training via DDP with the CCL backend, optional 8-bit quantization, and Weights & Biases experiment tracking.

Usage

Use this script when fine-tuning a LLaMA-family model with LoRA adapters on CPU infrastructure, particularly in Docker-based or Kubernetes-deployed environments. It is designed for the Alpaca instruction-following dataset format and supports multi-node distributed training.

Code Reference

Source Location

Signature

def train(
    base_model: str = "",
    data_path: str = "./alpaca_data_cleaned.json",
    output_dir: str = "./lora-alpaca",
    batch_size: int = 128,
    micro_batch_size: int = 4,
    num_epochs: int = 3,
    learning_rate: float = 3e-4,
    cutoff_len: int = 256,
    val_set_size: int = 2000,
    lora_r: int = 8,
    lora_alpha: int = 16,
    lora_dropout: float = 0.05,
    lora_target_modules: List[str] = ["q_proj", "v_proj"],
    train_on_inputs: bool = True,
    group_by_length: bool = False,
    wandb_project: str = "",
    wandb_run_name: str = "",
    wandb_watch: str = "",
    wandb_log_model: str = "",
    resume_from_checkpoint: str = None,
    use_ipex: bool = False,
    bf16: bool = False,
    no_cuda: bool = True,
    xpu_backend: str = "ccl",
):
    """LoRA fine-tuning on CPU with optional distributed training."""

Import

# This is a standalone script; run via:
# python lora_finetune.py --base_model "meta-llama/Llama-2-7b-hf" --data_path ./data.json

I/O Contract

Inputs

Name Type Required Description
base_model str Yes HuggingFace model ID or local path
data_path str Yes Path to Alpaca-format JSON dataset
output_dir str No Directory for saved LoRA adapter weights
lora_r int No LoRA rank (default: 8)
lora_alpha int No LoRA alpha scaling factor (default: 16)
lora_target_modules List[str] No Model layers to apply LoRA (default: q_proj, v_proj)
batch_size int No Total batch size (default: 128)
micro_batch_size int No Per-device batch size (default: 4)
num_epochs int No Training epochs (default: 3)
learning_rate float No Learning rate (default: 3e-4)

Outputs

Name Type Description
LoRA adapter weights Files Saved to output_dir via model.save_pretrained()
Training logs Dict Metrics logged to console and optionally WandB

Usage Examples

Basic CPU LoRA Fine-tuning

# Run from command line using fire CLI:
# python lora_finetune.py \
#     --base_model "meta-llama/Llama-2-7b-hf" \
#     --data_path "./alpaca_data_cleaned.json" \
#     --output_dir "./lora-output" \
#     --batch_size 128 \
#     --micro_batch_size 4 \
#     --num_epochs 3 \
#     --lora_r 8 \
#     --lora_alpha 16

Distributed Training with CCL

# Multi-node distributed training:
mpirun -n 4 python lora_finetune.py \
    --base_model "meta-llama/Llama-2-7b-hf" \
    --data_path "./alpaca_data_cleaned.json" \
    --xpu_backend "ccl" \
    --bf16 True

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment