Implementation:Intel Ipex llm CharacterTextSplitter Usage

Knowledge Sources	IPEX-LLM LangChain Text Splitters
Domains	NLP, RAG, Data_Processing
Last Updated	2026-02-09 00:00 GMT

Overview

LangChain CharacterTextSplitter for chunking documents in the IPEX-LLM RAG workflow.

Description

This is a Wrapper Doc for LangChain's CharacterTextSplitter used in the context of the IPEX-LLM RAG pipeline. It splits raw text into chunks by character count. In the IPEX-LLM example, it uses chunk_size=1000 and chunk_overlap=0.

External Reference

LangChain Text Splitters Documentation

Usage

Use to split text documents before embedding and vector store insertion in a RAG pipeline.

Code Reference

Source Location

Repository: IPEX-LLM
File: python/llm/example/GPU/LangChain/rag.py
Lines: 56-57

Signature

from langchain_text_splitters import CharacterTextSplitter

text_splitter = CharacterTextSplitter(
    chunk_size: int = 1000,
    chunk_overlap: int = 0,
)
texts = text_splitter.split_text(input_doc: str) -> List[str]

Import

from langchain_text_splitters import CharacterTextSplitter

I/O Contract

Inputs

Name	Type	Required	Description
chunk_size	int	No	Maximum characters per chunk (default 1000)
chunk_overlap	int	No	Character overlap between chunks (default 0)
input_doc	str	Yes	Raw text string to split

Outputs

Name	Type	Description
texts	List[str]	List of text chunks

Usage Examples

from langchain_text_splitters import CharacterTextSplitter

# Load document
with open("my_document.txt") as f:
    input_doc = f.read()

# Split into chunks
text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
texts = text_splitter.split_text(input_doc)
print(f"Split into {len(texts)} chunks")

Related Pages

Implements Principle

Principle:Intel_Ipex_llm_Document_Chunking

Requires Environment

Environment:Intel_Ipex_llm_RAG_LangChain_Environment

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment