Implementation:Intel Ipex llm Lora Finetune CPU
| Knowledge Sources | |
|---|---|
| Domains | Finetuning, LoRA, CPU |
| Last Updated | 2026-02-09 04:00 GMT |
Overview
Concrete tool for LoRA-based fine-tuning of LLaMA models on CPU provided by the IPEX-LLM Docker examples.
Description
The train() function implements a complete LoRA fine-tuning pipeline for causal language models on CPU. It loads a pre-trained model, applies LoRA adapters via PEFT, tokenizes an instruction-following dataset (Alpaca format), and trains using the HuggingFace Trainer API. Supports distributed training via DDP with the CCL backend, optional 8-bit quantization, and Weights & Biases experiment tracking.
Usage
Use this script when fine-tuning a LLaMA-family model with LoRA adapters on CPU infrastructure, particularly in Docker-based or Kubernetes-deployed environments. It is designed for the Alpaca instruction-following dataset format and supports multi-node distributed training.
Code Reference
Source Location
- Repository: Intel IPEX-LLM
- File: docker/llm/finetune/lora/cpu/docker/lora_finetune.py
- Lines: 1-316
Signature
def train(
base_model: str = "",
data_path: str = "./alpaca_data_cleaned.json",
output_dir: str = "./lora-alpaca",
batch_size: int = 128,
micro_batch_size: int = 4,
num_epochs: int = 3,
learning_rate: float = 3e-4,
cutoff_len: int = 256,
val_set_size: int = 2000,
lora_r: int = 8,
lora_alpha: int = 16,
lora_dropout: float = 0.05,
lora_target_modules: List[str] = ["q_proj", "v_proj"],
train_on_inputs: bool = True,
group_by_length: bool = False,
wandb_project: str = "",
wandb_run_name: str = "",
wandb_watch: str = "",
wandb_log_model: str = "",
resume_from_checkpoint: str = None,
use_ipex: bool = False,
bf16: bool = False,
no_cuda: bool = True,
xpu_backend: str = "ccl",
):
"""LoRA fine-tuning on CPU with optional distributed training."""
Import
# This is a standalone script; run via:
# python lora_finetune.py --base_model "meta-llama/Llama-2-7b-hf" --data_path ./data.json
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| base_model | str | Yes | HuggingFace model ID or local path |
| data_path | str | Yes | Path to Alpaca-format JSON dataset |
| output_dir | str | No | Directory for saved LoRA adapter weights |
| lora_r | int | No | LoRA rank (default: 8) |
| lora_alpha | int | No | LoRA alpha scaling factor (default: 16) |
| lora_target_modules | List[str] | No | Model layers to apply LoRA (default: q_proj, v_proj) |
| batch_size | int | No | Total batch size (default: 128) |
| micro_batch_size | int | No | Per-device batch size (default: 4) |
| num_epochs | int | No | Training epochs (default: 3) |
| learning_rate | float | No | Learning rate (default: 3e-4) |
Outputs
| Name | Type | Description |
|---|---|---|
| LoRA adapter weights | Files | Saved to output_dir via model.save_pretrained() |
| Training logs | Dict | Metrics logged to console and optionally WandB |
Usage Examples
Basic CPU LoRA Fine-tuning
# Run from command line using fire CLI:
# python lora_finetune.py \
# --base_model "meta-llama/Llama-2-7b-hf" \
# --data_path "./alpaca_data_cleaned.json" \
# --output_dir "./lora-output" \
# --batch_size 128 \
# --micro_batch_size 4 \
# --num_epochs 3 \
# --lora_r 8 \
# --lora_alpha 16
Distributed Training with CCL
# Multi-node distributed training:
mpirun -n 4 python lora_finetune.py \
--base_model "meta-llama/Llama-2-7b-hf" \
--data_path "./alpaca_data_cleaned.json" \
--xpu_backend "ccl" \
--bf16 True