Environment:Intel Ipex llm NPU Environment
| Knowledge Sources | |
|---|---|
| Domains | Infrastructure, NPU_Inference |
| Last Updated | 2026-02-09 04:00 GMT |
Overview
Intel NPU environment for running LLM inference, embedding generation, multimodal models, and speech processing via the Neural Processing Unit on Intel Meteor Lake and Lunar Lake processors.
Description
This environment provides an Intel NPU-accelerated context for LLM and multimodal inference using IPEX-LLM. It targets the integrated Neural Processing Unit (NPU) found in Intel Meteor Lake (Core Ultra 100 series) and Lunar Lake (Core Ultra 200 series) processors. The stack uses `ipex-llm[npu]` as the core library with the `intel-npu-acceleration-library` for NPU backend compilation and execution. Supported workloads include LLM text generation (e.g., Llama 2), BCE embedding generation, multimodal inference (e.g., MiniCPM-V), speech recognition (e.g., Paraformer), and model format conversion for NPU-optimized deployment.
Usage
Use this environment for any NPU Inference workflow including LLM text generation, embedding generation, multimodal inference, speech processing, and model conversion for NPU deployment. It is the mandatory prerequisite for running IPEX-LLM workloads on the Intel Neural Processing Unit.
System Requirements
| Category | Requirement | Notes |
|---|---|---|
| OS | Windows 11 or Ubuntu 22.04 LTS | Windows recommended for consumer NPU; Linux for server |
| Hardware | Intel Meteor Lake or Lunar Lake CPU | Must have integrated NPU (Core Ultra series) |
| NPU Driver | Intel NPU Driver | Latest NPU driver from Intel; Windows driver via Intel DSA |
| RAM | 16GB+ recommended | NPU shares system memory for model weights |
Dependencies
System Packages
- Intel NPU Driver (platform-specific)
- Intel NPU firmware (included with driver)
Python Packages
- `ipex-llm[npu]`
- `intel-npu-acceleration-library`
- `torch`
- `transformers`
- `numpy`
Credentials
No credentials are required for local NPU inference. The following may optionally be needed:
- HuggingFace Model Access: If using gated models (e.g., Llama 2), a `HF_TOKEN` environment variable may be needed.
Quick Install
# Install IPEX-LLM with NPU support
pip install --pre --upgrade ipex-llm[npu]
# Install NPU acceleration library
pip install intel-npu-acceleration-library
# Install model dependencies
pip install transformers numpy
# Verify NPU device availability
python -c "from ipex_llm import llm_npu; print('NPU backend available')"
Common Errors
| Error Message | Cause | Solution |
|---|---|---|
| `NPU device not found` | NPU driver not installed or hardware not present | Verify Intel Meteor Lake/Lunar Lake CPU and install NPU driver |
| `intel-npu-acceleration-library import error` | NPU acceleration library not installed | `pip install intel-npu-acceleration-library` |
| `Model too large for NPU` | NPU has limited memory capacity | Use a smaller model or quantize the model for NPU deployment |
| `Unsupported model architecture` | Not all model architectures are NPU-compatible | Check IPEX-LLM documentation for supported NPU model list |
Compatibility Notes
- Intel NPU Only: This environment targets the integrated NPU in Intel Meteor Lake and Lunar Lake processors. It does not work with discrete GPUs or CPUs alone.
- Model Conversion: Some models require conversion to NPU-optimized format before inference. Use the NPU model conversion workflow for this step.
- Shared Memory: The NPU shares system memory (RAM) with the CPU. Ensure sufficient RAM is available for model weights and activations.
- Limited Model Support: Not all HuggingFace model architectures are supported on NPU. Consult the IPEX-LLM NPU compatibility list.