Implementation:Intel Ipex llm NPU Model Convert

Knowledge Sources	Intel IPEX-LLM
Domains	Model_Conversion, NPU, Quantization
Last Updated	2026-02-09 04:00 GMT

Overview

Concrete tool for converting HuggingFace causal language models to low-bit NPU-optimized format for C++ deployment.

Description

This script converts a causal language model from HuggingFace format to a low-bit quantized format suitable for NPU inference via the C++ CLI. It uses IPEX-LLM's NPU-specific AutoModelForCausalLM to load and quantize the model, then saves both the model and tokenizer to a specified directory for downstream C++ inference.

Usage

Use this as a preprocessing step before deploying models via the C++ NPU CLI (llm-cli). The converted model files are optimized for NPU inference and cannot be used with standard HuggingFace APIs.

Code Reference

Source Location

Repository: Intel IPEX-LLM
File: python/llm/example/NPU/HF-Transformers-AutoModels/LLM/CPP_Examples/convert.py
Lines: 1-92

Signature

# Script-based execution with argparse
# Key API:
from ipex_llm.transformers.npu_model import AutoModelForCausalLM

model = AutoModelForCausalLM.from_pretrained(
    model_path,
    load_in_low_bit=args.low_bit,
    trust_remote_code=True,
    optimize_model=True,
)
model.save_low_bit(save_path)

Import

from ipex_llm.transformers.npu_model import AutoModelForCausalLM
from transformers import AutoTokenizer

I/O Contract

Inputs

Name	Type	Required	Description
repo-id-or-model-path	str	Yes	HuggingFace model ID or local path
save-path	str	Yes	Output directory for converted model
low-bit	str	No	Quantization type (default: sym_int4)
max-context-len	int	No	Maximum context length
max-prompt-len	int	No	Maximum prompt length

Outputs

Name	Type	Description
Converted model	Files	Low-bit NPU model files in save_path
Tokenizer	Files	Copied tokenizer files in save_path

Usage Examples

Convert Model for NPU C++ CLI

python convert.py \
    --repo-id-or-model-path "meta-llama/Llama-2-7b-chat-hf" \
    --save-path "./llama2-npu" \
    --low-bit "sym_int4"

Related Pages

Environment:Intel_Ipex_llm_NPU_Environment

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment