Implementation:Intel Ipex llm CharacterTextSplitter Usage
Appearance
| Knowledge Sources | |
|---|---|
| Domains | NLP, RAG, Data_Processing |
| Last Updated | 2026-02-09 00:00 GMT |
Overview
LangChain CharacterTextSplitter for chunking documents in the IPEX-LLM RAG workflow.
Description
This is a Wrapper Doc for LangChain's CharacterTextSplitter used in the context of the IPEX-LLM RAG pipeline. It splits raw text into chunks by character count. In the IPEX-LLM example, it uses chunk_size=1000 and chunk_overlap=0.
External Reference
Usage
Use to split text documents before embedding and vector store insertion in a RAG pipeline.
Code Reference
Source Location
- Repository: IPEX-LLM
- File: python/llm/example/GPU/LangChain/rag.py
- Lines: 56-57
Signature
from langchain_text_splitters import CharacterTextSplitter
text_splitter = CharacterTextSplitter(
chunk_size: int = 1000,
chunk_overlap: int = 0,
)
texts = text_splitter.split_text(input_doc: str) -> List[str]
Import
from langchain_text_splitters import CharacterTextSplitter
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| chunk_size | int | No | Maximum characters per chunk (default 1000) |
| chunk_overlap | int | No | Character overlap between chunks (default 0) |
| input_doc | str | Yes | Raw text string to split |
Outputs
| Name | Type | Description |
|---|---|---|
| texts | List[str] | List of text chunks |
Usage Examples
from langchain_text_splitters import CharacterTextSplitter
# Load document
with open("my_document.txt") as f:
input_doc = f.read()
# Split into chunks
text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
texts = text_splitter.split_text(input_doc)
print(f"Split into {len(texts)} chunks")
Related Pages
Implements Principle
Requires Environment
Page Connections
Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment