Environment:Intel Ipex llm RAG LlamaIndex Environment
| Knowledge Sources | |
|---|---|
| Domains | Infrastructure, RAG |
| Last Updated | 2026-02-09 04:00 GMT |
Overview
Intel XPU environment with LlamaIndex, IPEX-LLM, PostgreSQL vector store, and sentence-transformers for Retrieval-Augmented Generation on Intel GPUs.
Description
This environment provides an Intel XPU-accelerated context for RAG (Retrieval-Augmented Generation) pipelines using LlamaIndex. It integrates IPEX-LLM as the inference backend for the LLM component, sentence-transformers for embedding generation, and PostgreSQL (via psycopg2) as the vector store backend. The environment enables building end-to-end RAG workflows where documents are chunked, embedded, stored in a PostgreSQL-based vector index, and retrieved to augment LLM generation with relevant context.
Usage
Use this environment for any LlamaIndex RAG workflow that requires Intel XPU acceleration. It is the mandatory prerequisite for running LlamaIndex-based document indexing, vector similarity search, and retrieval-augmented generation with IPEX-LLM on Intel GPUs.
System Requirements
| Category | Requirement | Notes |
|---|---|---|
| OS | Ubuntu 22.04 LTS | Intel OneAPI base toolkit required |
| Hardware | Intel GPU (Arc/Flex/Max) | XPU device for LLM inference and embedding generation |
| GPU Driver | Intel GPU drivers | Level Zero runtime required |
| Database | PostgreSQL | Required for vector store backend; pgvector extension recommended |
Dependencies
System Packages
- Intel OneAPI Base Toolkit
- `intel-opencl-icd`
- `intel-level-zero-gpu`
- PostgreSQL server (with pgvector extension)
Python Packages
- `ipex-llm[xpu]` (pre-release)
- `torch` (XPU variant)
- `intel_extension_for_pytorch` (XPU variant)
- `llama-index`
- `llama-index-core`
- `llama-index-readers-file`
- `psycopg2` (or `psycopg2-binary` for PostgreSQL connectivity)
- `sentence-transformers`
- `transformers`
Credentials
The following may be required depending on your PostgreSQL and model configuration:
- PostgreSQL Connection: Database host, port, username, password, and database name for the vector store backend.
- HuggingFace Model Access: If using gated models (e.g., Llama), a `HF_TOKEN` environment variable may be needed.
Quick Install
# Source Intel OneAPI environment
source /opt/intel/oneapi/setvars.sh
# Install IPEX-LLM with XPU support
pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/
# Install LlamaIndex RAG dependencies
pip install llama-index llama-index-core llama-index-readers-file psycopg2-binary sentence-transformers transformers
# Set runtime environment
export SYCL_CACHE_PERSISTENT=1
Common Errors
| Error Message | Cause | Solution |
|---|---|---|
| `ModuleNotFoundError: No module named 'llama_index'` | LlamaIndex not installed | `pip install llama-index` |
| `psycopg2.OperationalError: could not connect to server` | PostgreSQL not running or misconfigured | Start PostgreSQL and verify connection credentials |
| `RuntimeError: No XPU device found` | Intel GPU drivers not installed | Install Intel GPU drivers and Level Zero runtime |
| `sentence_transformers not found` | Sentence-transformers not installed | `pip install sentence-transformers` |
Compatibility Notes
- Intel XPU Only: Both the LLM inference and embedding generation run on Intel XPU. The environment is not compatible with CUDA devices.
- PostgreSQL Vector Store: The pgvector extension for PostgreSQL provides efficient vector similarity search. Ensure PostgreSQL is configured with the pgvector extension.
- LlamaIndex Version: LlamaIndex v0.10+ uses a modular package structure (`llama-index-core`, `llama-index-readers-file`, etc.).