Implementation:Intel Ipex llm NPU BCE Embedding

Knowledge Sources	Intel IPEX-LLM
Domains	Embeddings, NPU, NLP
Last Updated	2026-02-09 04:00 GMT

Overview

Concrete tool for generating text embeddings on Intel NPU using IPEX-LLM's EmbeddingModel API.

Description

This script loads a BCE (Bidirectional Contrastive Embedding) model optimized for Intel NPU using IPEX-LLM's EmbeddingModel. It accepts multiple text prompts and generates dense embedding vectors suitable for semantic search, retrieval, and similarity computations. The model is loaded with configurable low-bit quantization for NPU acceleration.

Usage

Use this when generating text embeddings on Intel NPU hardware for tasks such as semantic search, document retrieval, or similarity comparison. The EmbeddingModel API provides NPU-optimized inference for embedding models.

Code Reference

Source Location

Repository: Intel IPEX-LLM
File: python/llm/example/NPU/HF-Transformers-AutoModels/Embedding/bce-embedding.py
Lines: 1-72

Signature

# Script-based execution with argparse
# Key API:
from ipex_llm.transformers.npu_model import EmbeddingModel

model = EmbeddingModel(model_path)
embeddings = model.encode(prompts)

Import

from ipex_llm.transformers.npu_model import EmbeddingModel

I/O Contract

Inputs

Name	Type	Required	Description
repo-id-or-model-path	str	Yes	HuggingFace embedding model ID or local path
prompt	str	No	Text prompts for embedding (multiple allowed)

Outputs

Name	Type	Description
Embedding vectors	numpy array	Dense vector representations of input texts
Timing	Console	Inference latency

Usage Examples

Generate Embeddings on NPU

python bce-embedding.py \
    --repo-id-or-model-path "maidalun1020/bce-embedding-base_v1" \
    --prompt "What is AI?" "Deep learning is a subset of machine learning"

Related Pages

Environment:Intel_Ipex_llm_NPU_Environment

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment