Implementation:Intel Ipex llm NPU Multimodal MiniCPM
| Knowledge Sources | |
|---|---|
| Domains | Multimodal, NPU, Vision_Language |
| Last Updated | 2026-02-09 04:00 GMT |
Overview
Concrete tool for running multimodal vision-language inference on Intel NPU using the MiniCPM-Llama3-V-2.5 model with IPEX-LLM.
Description
This script loads a multimodal MiniCPM-Llama3-V-2.5 model on Intel NPU using IPEX-LLM's NPU-specific AutoModel. It processes both text prompts and images (from URL or local path), using the model's built-in chat method for multimodal generation. Supports configurable quantization and context length parameters for NPU optimization.
Usage
Use this when running multimodal vision-language models on Intel NPU hardware. The pattern demonstrated here applies to other multimodal models supported by IPEX-LLM's NPU backend.
Code Reference
Source Location
- Repository: Intel IPEX-LLM
- File: python/llm/example/NPU/HF-Transformers-AutoModels/Multimodal/minicpm-llama3-v2.5.py
- Lines: 1-106
Signature
# Script-based execution with argparse
# Key API:
from ipex_llm.transformers.npu_model import AutoModel
model = AutoModel.from_pretrained(
model_path,
load_in_low_bit=args.low_bit,
optimize_model=True,
trust_remote_code=True,
max_context_len=args.max_context_len,
max_prompt_len=args.max_prompt_len,
)
response = model.chat(tokenizer, image, msgs, ...)
Import
from ipex_llm.transformers.npu_model import AutoModel
from transformers import AutoTokenizer
from PIL import Image
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| repo-id-or-model-path | str | Yes | Model path (e.g., openbmb/MiniCPM-Llama3-V-2.5) |
| image-url-or-path | str | No | Image URL or local file path |
| prompt | str | No | Text prompt for image description |
| low-bit | str | No | Quantization type (default: sym_int8) |
Outputs
| Name | Type | Description |
|---|---|---|
| Generated text | Console | Multimodal model response describing the image |
| Timing | Console | Inference latency |
Usage Examples
Multimodal NPU Inference
python minicpm-llama3-v2.5.py \
--repo-id-or-model-path "openbmb/MiniCPM-Llama3-V-2.5" \
--image-url-or-path "https://example.com/cat.jpg" \
--prompt "What do you see in this image?"