Workflow:Intel Ipex llm RAG With LangChain
| Knowledge Sources | |
|---|---|
| Domains | LLMs, RAG, Information_Retrieval |
| Last Updated | 2026-02-09 04:00 GMT |
Overview
End-to-end process for building a Retrieval-Augmented Generation (RAG) pipeline on Intel GPUs using IPEX-LLM with LangChain framework integration.
Description
This workflow constructs a RAG pipeline that combines document retrieval with LLM generation to answer questions grounded in specific source documents. It uses IPEX-LLM's LangChain integration (IpexLLM and IpexLLMBgeEmbeddings) to run both the embedding model and the generative LLM on Intel XPU hardware. Documents are split into chunks, embedded using a BGE embedding model, stored in a Chroma vector database, and retrieved based on semantic similarity to the user's query. The retrieved context is then combined with the query in a prompt template and fed to the LLM for grounded answer generation.
Usage
Execute this workflow when you need to answer questions based on specific documents or knowledge bases, rather than relying solely on the LLM's pre-trained knowledge. Suitable for enterprise knowledge management, document QA, and domain-specific chatbots running on Intel GPU infrastructure.
Execution Steps
Step 1: Environment Setup
Configure the Intel GPU runtime and install required dependencies including ipex-llm, langchain, langchain-community, langchain-chroma, and a BGE embedding model. Ensure XPU device is available for both embedding computation and LLM inference.
Key considerations:
- Requires both a generative LLM and a separate embedding model
- langchain-community provides IpexLLM and IpexLLMBgeEmbeddings integrations
- Chroma vector database is used for document storage and retrieval
- Both models run on xpu device for accelerated computation
Step 2: Document Loading and Chunking
Load source documents from files or in-memory text strings. Split documents into smaller chunks using CharacterTextSplitter with configurable chunk size and overlap. The chunk size should balance between providing enough context per chunk and keeping embeddings focused.
Key considerations:
- Default chunk size is 1000 characters with 0 overlap
- Supports loading from file paths or direct text input
- Chunk granularity affects retrieval precision and recall
- Larger chunks provide more context but may dilute relevance
Step 3: Embedding and Vector Store Creation
Initialize the BGE embedding model on Intel XPU using IpexLLMBgeEmbeddings. Embed all document chunks and store the resulting vectors in a Chroma vector database. Create a retriever interface that performs similarity search against the vector store.
Key considerations:
- BGE (BAAI General Embedding) models provide high-quality text embeddings
- Embeddings are normalized for cosine similarity search
- Chroma provides in-memory vector storage with optional persistence
- Metadata (source index) is attached to each chunk for traceability
Step 4: LLM Initialization
Load the generative LLM using IpexLLM.from_model_id() which automatically applies IPEX-LLM optimizations for Intel XPU. Configure generation parameters including temperature, max length, and device placement.
Key considerations:
- IpexLLM wraps HuggingFace models with IPEX-LLM acceleration
- Model is loaded and optimized for xpu device automatically
- Temperature controls generation randomness (0 for deterministic)
- trust_remote_code may be needed for certain model architectures
Step 5: RAG Chain Assembly and Query
Construct the RAG chain using LangChain's LCEL (LangChain Expression Language) pipeline. The chain connects the retriever (for document fetching), a prompt template (combining context and question), the LLM (for answer generation), and an output parser. Submit user queries through the chain to get grounded answers.
Key considerations:
- Uses the standard "rlm/rag-prompt" template from LangChain hub
- Retrieved documents are formatted and concatenated as context
- The chain is composable: retriever | format_docs | prompt | llm | parser
- Responses are grounded in the retrieved document context