Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:Intel Ipex llm NPU Model Convert

From Leeroopedia


Knowledge Sources
Domains Model_Conversion, NPU, Quantization
Last Updated 2026-02-09 04:00 GMT

Overview

Concrete tool for converting HuggingFace causal language models to low-bit NPU-optimized format for C++ deployment.

Description

This script converts a causal language model from HuggingFace format to a low-bit quantized format suitable for NPU inference via the C++ CLI. It uses IPEX-LLM's NPU-specific AutoModelForCausalLM to load and quantize the model, then saves both the model and tokenizer to a specified directory for downstream C++ inference.

Usage

Use this as a preprocessing step before deploying models via the C++ NPU CLI (llm-cli). The converted model files are optimized for NPU inference and cannot be used with standard HuggingFace APIs.

Code Reference

Source Location

Signature

# Script-based execution with argparse
# Key API:
from ipex_llm.transformers.npu_model import AutoModelForCausalLM

model = AutoModelForCausalLM.from_pretrained(
    model_path,
    load_in_low_bit=args.low_bit,
    trust_remote_code=True,
    optimize_model=True,
)
model.save_low_bit(save_path)

Import

from ipex_llm.transformers.npu_model import AutoModelForCausalLM
from transformers import AutoTokenizer

I/O Contract

Inputs

Name Type Required Description
repo-id-or-model-path str Yes HuggingFace model ID or local path
save-path str Yes Output directory for converted model
low-bit str No Quantization type (default: sym_int4)
max-context-len int No Maximum context length
max-prompt-len int No Maximum prompt length

Outputs

Name Type Description
Converted model Files Low-bit NPU model files in save_path
Tokenizer Files Copied tokenizer files in save_path

Usage Examples

Convert Model for NPU C++ CLI

python convert.py \
    --repo-id-or-model-path "meta-llama/Llama-2-7b-chat-hf" \
    --save-path "./llama2-npu" \
    --low-bit "sym_int4"

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment