Implementation:CrewAIInc CrewAI RAG Base Chunker

Knowledge Sources	CrewAI
Domains	RAG, Text_Processing
Last Updated	2026-02-11 00:00 GMT

Overview

Implements a recursive text splitting algorithm that intelligently chunks text while preserving semantic boundaries and maintaining configurable overlap between chunks.

Description

This module contains two main classes that form the core chunking engine for the CrewAI RAG system.

RecursiveCharacterTextSplitter implements a hierarchical splitting strategy. It tries separators in descending order of granularity: double newlines (paragraph breaks), single newlines, spaces, and finally individual characters. When a chunk produced by one separator still exceeds the configured chunk_size, the splitter recursively applies the next finer-grained separator. The algorithm also maintains context through configurable chunk_overlap, keeping portions of the previous chunk in the next one. Separator preservation is controlled by the keep_separator flag. The merge step reassembles small splits into chunks close to the target size while respecting overlap constraints.

BaseChunker is a lightweight wrapper around RecursiveCharacterTextSplitter that provides default parameters (chunk_size=1000, chunk_overlap=200) and a simple chunk() interface. It also handles the edge case of empty or whitespace-only text by returning an empty list.

Usage

Import BaseChunker when you need to chunk text content for embedding. It is used internally by the RAG system's DataType registry, which instantiates chunkers for each data type. You can also use it directly by constructing a BaseChunker with custom parameters and calling its chunk() method.

Code Reference

Source Location

Repository: CrewAI
File: lib/crewai-tools/src/crewai_tools/rag/chunkers/base_chunker.py
Lines: 1-191

Signature

class RecursiveCharacterTextSplitter:
    def __init__(
        self,
        chunk_size: int = 4000,
        chunk_overlap: int = 200,
        separators: list[str] | None = None,
        keep_separator: bool = True,
    ) -> None: ...

    def split_text(self, text: str) -> list[str]: ...

class BaseChunker:
    def __init__(
        self,
        chunk_size: int = 1000,
        chunk_overlap: int = 200,
        separators: list[str] | None = None,
        keep_separator: bool = True,
    ) -> None: ...

    def chunk(self, text: str) -> list[str]: ...

Import

from crewai_tools.rag.chunkers.base_chunker import BaseChunker
from crewai_tools.rag.chunkers.base_chunker import RecursiveCharacterTextSplitter

I/O Contract

Inputs (BaseChunker.init)

Name	Type	Required	Description
chunk_size	int	No	Maximum size of each chunk (default 1000)
chunk_overlap	int	No	Number of characters to overlap between chunks (default 200)
separators	None	No	List of separators in order of preference (default: ["\n\n", "\n", " ", ""])
keep_separator	bool	No	Whether to keep the separator in the split text (default True)

Inputs (BaseChunker.chunk)

Name	Type	Required	Description
text	str	Yes	The text to chunk into smaller pieces

Outputs

Name	Type	Description
return	list[str]	A list of text chunks. Returns empty list for empty or whitespace-only input.

Usage Examples

Basic Usage

from crewai_tools.rag.chunkers.base_chunker import BaseChunker

# Create a chunker with default settings
chunker = BaseChunker()

# Chunk a long document
text = "First paragraph.\n\nSecond paragraph with more content.\n\nThird paragraph."
chunks = chunker.chunk(text)
# Returns chunks respecting paragraph boundaries with overlap

# Create a chunker with custom settings
chunker = BaseChunker(chunk_size=500, chunk_overlap=50)
chunks = chunker.chunk(text)

Using RecursiveCharacterTextSplitter Directly

from crewai_tools.rag.chunkers.base_chunker import RecursiveCharacterTextSplitter

splitter = RecursiveCharacterTextSplitter(
    chunk_size=4000,
    chunk_overlap=200,
    separators=["\n\n", "\n", ". ", " ", ""],
    keep_separator=True,
)
chunks = splitter.split_text(long_document)

Related Pages

Principle:CrewAIInc_CrewAI_Knowledge_Ingestion

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment