Workflow:Intel Ipex llm RAG With LangChain

Knowledge Sources	IPEX-LLM LangChain Documentation
Domains	LLMs, RAG, Information_Retrieval
Last Updated	2026-02-09 04:00 GMT

Overview

End-to-end process for building a Retrieval-Augmented Generation (RAG) pipeline on Intel GPUs using IPEX-LLM with LangChain framework integration.

Description

This workflow constructs a RAG pipeline that combines document retrieval with LLM generation to answer questions grounded in specific source documents. It uses IPEX-LLM's LangChain integration (IpexLLM and IpexLLMBgeEmbeddings) to run both the embedding model and the generative LLM on Intel XPU hardware. Documents are split into chunks, embedded using a BGE embedding model, stored in a Chroma vector database, and retrieved based on semantic similarity to the user's query. The retrieved context is then combined with the query in a prompt template and fed to the LLM for grounded answer generation.

Usage

Execute this workflow when you need to answer questions based on specific documents or knowledge bases, rather than relying solely on the LLM's pre-trained knowledge. Suitable for enterprise knowledge management, document QA, and domain-specific chatbots running on Intel GPU infrastructure.

Execution Steps

Step 1: Environment Setup

Configure the Intel GPU runtime and install required dependencies including ipex-llm, langchain, langchain-community, langchain-chroma, and a BGE embedding model. Ensure XPU device is available for both embedding computation and LLM inference.

Key considerations:

Requires both a generative LLM and a separate embedding model
langchain-community provides IpexLLM and IpexLLMBgeEmbeddings integrations
Chroma vector database is used for document storage and retrieval
Both models run on xpu device for accelerated computation

Step 2: Document Loading and Chunking

Load source documents from files or in-memory text strings. Split documents into smaller chunks using CharacterTextSplitter with configurable chunk size and overlap. The chunk size should balance between providing enough context per chunk and keeping embeddings focused.

Key considerations:

Default chunk size is 1000 characters with 0 overlap
Supports loading from file paths or direct text input
Chunk granularity affects retrieval precision and recall
Larger chunks provide more context but may dilute relevance

Step 3: Embedding and Vector Store Creation

Initialize the BGE embedding model on Intel XPU using IpexLLMBgeEmbeddings. Embed all document chunks and store the resulting vectors in a Chroma vector database. Create a retriever interface that performs similarity search against the vector store.

Key considerations:

BGE (BAAI General Embedding) models provide high-quality text embeddings
Embeddings are normalized for cosine similarity search
Chroma provides in-memory vector storage with optional persistence
Metadata (source index) is attached to each chunk for traceability

Step 4: LLM Initialization

Load the generative LLM using IpexLLM.from_model_id() which automatically applies IPEX-LLM optimizations for Intel XPU. Configure generation parameters including temperature, max length, and device placement.

Key considerations:

IpexLLM wraps HuggingFace models with IPEX-LLM acceleration
Model is loaded and optimized for xpu device automatically
Temperature controls generation randomness (0 for deterministic)
trust_remote_code may be needed for certain model architectures

Step 5: RAG Chain Assembly and Query

Construct the RAG chain using LangChain's LCEL (LangChain Expression Language) pipeline. The chain connects the retriever (for document fetching), a prompt template (combining context and question), the LLM (for answer generation), and an output parser. Submit user queries through the chain to get grounded answers.

Key considerations:

Uses the standard "rlm/rag-prompt" template from LangChain hub
Retrieved documents are formatted and concatenated as context
The chain is composable: retriever | format_docs | prompt | llm | parser
Responses are grounded in the retrieved document context

Execution Diagram

GitHub URL

Workflow Repository