Implementation:Intel Ipex llm NPU BCE Embedding
| Knowledge Sources | |
|---|---|
| Domains | Embeddings, NPU, NLP |
| Last Updated | 2026-02-09 04:00 GMT |
Overview
Concrete tool for generating text embeddings on Intel NPU using IPEX-LLM's EmbeddingModel API.
Description
This script loads a BCE (Bidirectional Contrastive Embedding) model optimized for Intel NPU using IPEX-LLM's EmbeddingModel. It accepts multiple text prompts and generates dense embedding vectors suitable for semantic search, retrieval, and similarity computations. The model is loaded with configurable low-bit quantization for NPU acceleration.
Usage
Use this when generating text embeddings on Intel NPU hardware for tasks such as semantic search, document retrieval, or similarity comparison. The EmbeddingModel API provides NPU-optimized inference for embedding models.
Code Reference
Source Location
- Repository: Intel IPEX-LLM
- File: python/llm/example/NPU/HF-Transformers-AutoModels/Embedding/bce-embedding.py
- Lines: 1-72
Signature
# Script-based execution with argparse
# Key API:
from ipex_llm.transformers.npu_model import EmbeddingModel
model = EmbeddingModel(model_path)
embeddings = model.encode(prompts)
Import
from ipex_llm.transformers.npu_model import EmbeddingModel
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| repo-id-or-model-path | str | Yes | HuggingFace embedding model ID or local path |
| prompt | str | No | Text prompts for embedding (multiple allowed) |
Outputs
| Name | Type | Description |
|---|---|---|
| Embedding vectors | numpy array | Dense vector representations of input texts |
| Timing | Console | Inference latency |
Usage Examples
Generate Embeddings on NPU
python bce-embedding.py \
--repo-id-or-model-path "maidalun1020/bce-embedding-base_v1" \
--prompt "What is AI?" "Deep learning is a subset of machine learning"