Implementation:Langchain ai Langchain RecursiveCharacterTextSplitter Split Documents
| Knowledge Sources | |
|---|---|
| Domains | Data_Preprocessing, NLP |
| Last Updated | 2026-02-11 00:00 GMT |
Overview
Concrete tool for recursively splitting documents into chunks at natural text boundaries provided by langchain-text-splitters.
Description
The RecursiveCharacterTextSplitter splits text by trying a sequence of separators (double newline, single newline, space, then character-by-character), preferring larger semantic boundaries. It inherits from TextSplitter which provides split_documents() for processing Document lists and split_text() for raw strings.
Usage
Use RecursiveCharacterTextSplitter as the default text splitter for most use cases. It handles prose, code, and mixed content effectively.
Code Reference
Source Location
- Repository: langchain
- File: libs/text-splitters/langchain_text_splitters/character.py
- Lines: L88-177
Signature
class RecursiveCharacterTextSplitter(TextSplitter):
def __init__(
self,
separators: list[str] | None = None,
keep_separator: bool | Literal["start", "end"] = True,
is_separator_regex: bool = False,
**kwargs: Any,
) -> None:
The TextSplitter base class accepts additional keyword arguments:
# From TextSplitter base:
chunk_size: int = 4000
chunk_overlap: int = 200
length_function: Callable[[str], int] = len
Import
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_core.documents import Document
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| documents | Iterable[Document] | Yes (for split_documents) | Documents to split |
| chunk_size | int | No (default: 4000) | Maximum chunk size in characters |
| chunk_overlap | int | No (default: 200) | Overlap between consecutive chunks |
| separators | list[str] or None | No | Custom separator sequence |
Outputs
| Name | Type | Description |
|---|---|---|
| return | list[Document] | Chunked documents with preserved metadata |
Usage Examples
Splitting Documents
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_core.documents import Document
text_splitter = RecursiveCharacterTextSplitter(
chunk_size=1000,
chunk_overlap=200,
)
docs = [Document(page_content="Long document text here...", metadata={"source": "file.pdf"})]
chunks = text_splitter.split_documents(docs)
# Each chunk inherits metadata from the parent document