Implementation:Intel Ipex llm LlamaIndex RAG
| Knowledge Sources | |
|---|---|
| Domains | RAG, Vector_Store, LlamaIndex |
| Last Updated | 2026-02-09 04:00 GMT |
Overview
Concrete tool for building a Retrieval-Augmented Generation pipeline using LlamaIndex with IPEX-LLM embeddings and LLM on Intel XPU.
Description
This script implements a complete RAG pipeline using LlamaIndex: PDF document loading via PyMuPDFReader, sentence-level text splitting, BGE embeddings via IpexLLMEmbedding, PostgreSQL-backed vector storage (PGVectorStore), custom retrieval via VectorDBRetriever, and question answering via IpexLLM as the generation backend. It provides end-to-end document ingestion and querying with IPEX-LLM optimizations.
Usage
Use this when building a RAG application that requires PDF document ingestion with PostgreSQL vector storage and IPEX-LLM acceleration for both embedding generation and text generation on Intel hardware.
Code Reference
Source Location
- Repository: Intel IPEX-LLM
- File: python/llm/example/GPU/LlamaIndex/rag.py
- Lines: 1-252
Signature
class VectorDBRetriever(BaseRetriever):
def _retrieve(self, query_bundle: QueryBundle) -> list:
"""Retrieve similar documents from PostgreSQL vector store."""
def load_vector_database(username, password) -> PGVectorStore:
"""Create or connect to PostgreSQL vector store."""
def load_data(data_path) -> list:
"""Load PDF and split into sentence chunks."""
def main(args):
"""Main RAG pipeline orchestration."""
Import
from llama_index.embeddings.ipex_llm import IpexLLMEmbedding
from llama_index.llms.ipex_llm import IpexLLM
from llama_index.vector_stores.postgres import PGVectorStore
from llama_index.core import RetrieverQueryEngine
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| model-path | str | Yes | Path to transformer model for generation |
| tokenizer-path | str | Yes | Path to tokenizer |
| embedding-model-path | str | No | Embedding model (default: BAAI/bge-small-en) |
| data | str | No | PDF file path (default: ./data/llama2.pdf) |
| question | str | No | Query question |
| user | str | Yes | PostgreSQL username |
| password | str | Yes | PostgreSQL password |
Outputs
| Name | Type | Description |
|---|---|---|
| RAG response | Console | Generated answer with retrieved context |
| Vector store | PostgreSQL | Persistent document embeddings |
Usage Examples
RAG Query
python rag.py \
-m "/path/to/llama2-model" \
-t "/path/to/tokenizer" \
-e "BAAI/bge-small-en" \
-d "./data/llama2.pdf" \
-u "postgres" \
-p "password" \
-q "How does Llama 2 perform compared to other models?"