Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:Langchain ai Langchain RecursiveCharacterTextSplitter Split Documents

From Leeroopedia
Knowledge Sources
Domains Data_Preprocessing, NLP
Last Updated 2026-02-11 00:00 GMT

Overview

Concrete tool for recursively splitting documents into chunks at natural text boundaries provided by langchain-text-splitters.

Description

The RecursiveCharacterTextSplitter splits text by trying a sequence of separators (double newline, single newline, space, then character-by-character), preferring larger semantic boundaries. It inherits from TextSplitter which provides split_documents() for processing Document lists and split_text() for raw strings.

Usage

Use RecursiveCharacterTextSplitter as the default text splitter for most use cases. It handles prose, code, and mixed content effectively.

Code Reference

Source Location

  • Repository: langchain
  • File: libs/text-splitters/langchain_text_splitters/character.py
  • Lines: L88-177

Signature

class RecursiveCharacterTextSplitter(TextSplitter):
    def __init__(
        self,
        separators: list[str] | None = None,
        keep_separator: bool | Literal["start", "end"] = True,
        is_separator_regex: bool = False,
        **kwargs: Any,
    ) -> None:

The TextSplitter base class accepts additional keyword arguments:

# From TextSplitter base:
chunk_size: int = 4000
chunk_overlap: int = 200
length_function: Callable[[str], int] = len

Import

from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_core.documents import Document

I/O Contract

Inputs

Name Type Required Description
documents Iterable[Document] Yes (for split_documents) Documents to split
chunk_size int No (default: 4000) Maximum chunk size in characters
chunk_overlap int No (default: 200) Overlap between consecutive chunks
separators list[str] or None No Custom separator sequence

Outputs

Name Type Description
return list[Document] Chunked documents with preserved metadata

Usage Examples

Splitting Documents

from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_core.documents import Document

text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=1000,
    chunk_overlap=200,
)

docs = [Document(page_content="Long document text here...", metadata={"source": "file.pdf"})]
chunks = text_splitter.split_documents(docs)
# Each chunk inherits metadata from the parent document

Related Pages

Implements Principle

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment