Environment:Intel Ipex llm RAG LangChain Environment
| Knowledge Sources | |
|---|---|
| Domains | Infrastructure, RAG |
| Last Updated | 2026-02-09 12:00 GMT |
Overview
Intel XPU environment with LangChain, IPEX-LLM LLM/Embedding integrations, and Chroma vector store for Retrieval-Augmented Generation on Intel GPUs.
Description
This environment provides an Intel XPU-accelerated context for RAG (Retrieval-Augmented Generation) pipelines using LangChain. It uses `IpexLLM` as the LangChain LLM wrapper and `IpexLLMBgeEmbeddings` for BGE embedding model acceleration on Intel GPUs. The vector store uses Chroma for in-memory similarity search. The environment requires an Intel GPU with XPU support for both the LLM inference and embedding generation components.
Usage
Use this environment for any RAG With LangChain workflow that requires Intel XPU acceleration. It is the mandatory prerequisite for running the IPEX-LLM LangChain integrations including `IpexLLM.from_model_id()`, `IpexLLMBgeEmbeddings`, and the LCEL RAG chain assembly.
System Requirements
| Category | Requirement | Notes |
|---|---|---|
| OS | Ubuntu 22.04 LTS | Intel OneAPI base toolkit required |
| Hardware | Intel GPU (Arc/Flex/Max) | XPU device for both LLM and embedding model |
| GPU Driver | Intel GPU drivers | Level Zero runtime required |
Dependencies
System Packages
- Intel OneAPI Base Toolkit
- `intel-opencl-icd`
- `intel-level-zero-gpu`
Python Packages
- `ipex-llm[xpu]` (pre-release)
- `torch` (XPU variant)
- `intel_extension_for_pytorch` (XPU variant)
- `langchain`
- `langchain-text-splitters`
- `langchain-community` (provides `IpexLLMBgeEmbeddings`, `IpexLLM`)
- `langchain-core`
- `langchain-chroma` (provides `Chroma` vector store)
- `langchainhub` (for pulling prompt templates)
- `chromadb`
- `transformers`
Credentials
No API keys or tokens are required for local RAG with local models. However:
- HuggingFace Model Access: If using gated models (e.g., Llama), a `HF_TOKEN` environment variable may be needed.
- LangChain Hub: Pulling prompts from `langchain hub` (e.g., `hub.pull("rlm/rag-prompt")`) requires internet access but no API key.
Quick Install
# Source Intel OneAPI environment
source /opt/intel/oneapi/setvars.sh
# Install IPEX-LLM with XPU support
pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/
# Install LangChain RAG dependencies
pip install langchain langchain-text-splitters langchain-community langchain-core langchain-chroma langchainhub chromadb transformers
Code Evidence
LangChain IPEX-LLM imports from `rag.py:27-33`:
from langchain import hub
from langchain_text_splitters import CharacterTextSplitter
from langchain_community.embeddings import IpexLLMBgeEmbeddings
from langchain_community.llms import IpexLLM
from langchain_core.runnables import RunnablePassthrough
from langchain_core.output_parsers import StrOutputParser
from langchain_chroma import Chroma
XPU device usage for embeddings from `rag.py:60-63`:
embeddings = IpexLLMBgeEmbeddings(
model_name=embed_model_path,
model_kwargs={"device": "xpu"},
encode_kwargs={"normalize_embeddings": True},
)
XPU device usage for LLM from `rag.py:67-75`:
llm = IpexLLM.from_model_id(
model_id=model_path,
model_kwargs={
"temperature": 0,
"max_length": 512,
"trust_remote_code": True,
"device": "xpu",
},
)
Common Errors
| Error Message | Cause | Solution |
|---|---|---|
| `UserWarning: padding_mask` | Benign HuggingFace warning during inference | Suppress with `warnings.filterwarnings("ignore", category=UserWarning, message=".*padding_mask.*")` |
| `ModuleNotFoundError: langchain_community` | LangChain community package not installed | `pip install langchain-community` |
| `ModuleNotFoundError: langchain_chroma` | Chroma LangChain integration not installed | `pip install langchain-chroma chromadb` |
| `XPU device not found` | Intel GPU drivers not installed | Install Intel OneAPI toolkit and GPU drivers |
Compatibility Notes
- Intel XPU Only: Both the LLM and embedding models run on Intel XPU. The `device="xpu"` parameter is required in both `model_kwargs` dictionaries.
- BGE Embeddings: The `IpexLLMBgeEmbeddings` class is specifically designed for BAAI BGE embedding models. Use `normalize_embeddings=True` for cosine similarity in the vector store.
- Chroma In-Memory: The default Chroma setup is in-memory. For persistent storage, configure a Chroma directory.