Implementation:CrewAIInc CrewAI RAG Base Chunker
| Knowledge Sources | |
|---|---|
| Domains | RAG, Text_Processing |
| Last Updated | 2026-02-11 00:00 GMT |
Overview
Implements a recursive text splitting algorithm that intelligently chunks text while preserving semantic boundaries and maintaining configurable overlap between chunks.
Description
This module contains two main classes that form the core chunking engine for the CrewAI RAG system.
RecursiveCharacterTextSplitter implements a hierarchical splitting strategy. It tries separators in descending order of granularity: double newlines (paragraph breaks), single newlines, spaces, and finally individual characters. When a chunk produced by one separator still exceeds the configured chunk_size, the splitter recursively applies the next finer-grained separator. The algorithm also maintains context through configurable chunk_overlap, keeping portions of the previous chunk in the next one. Separator preservation is controlled by the keep_separator flag. The merge step reassembles small splits into chunks close to the target size while respecting overlap constraints.
BaseChunker is a lightweight wrapper around RecursiveCharacterTextSplitter that provides default parameters (chunk_size=1000, chunk_overlap=200) and a simple chunk() interface. It also handles the edge case of empty or whitespace-only text by returning an empty list.
Usage
Import BaseChunker when you need to chunk text content for embedding. It is used internally by the RAG system's DataType registry, which instantiates chunkers for each data type. You can also use it directly by constructing a BaseChunker with custom parameters and calling its chunk() method.
Code Reference
Source Location
- Repository: CrewAI
- File: lib/crewai-tools/src/crewai_tools/rag/chunkers/base_chunker.py
- Lines: 1-191
Signature
class RecursiveCharacterTextSplitter:
def __init__(
self,
chunk_size: int = 4000,
chunk_overlap: int = 200,
separators: list[str] | None = None,
keep_separator: bool = True,
) -> None: ...
def split_text(self, text: str) -> list[str]: ...
class BaseChunker:
def __init__(
self,
chunk_size: int = 1000,
chunk_overlap: int = 200,
separators: list[str] | None = None,
keep_separator: bool = True,
) -> None: ...
def chunk(self, text: str) -> list[str]: ...
Import
from crewai_tools.rag.chunkers.base_chunker import BaseChunker
from crewai_tools.rag.chunkers.base_chunker import RecursiveCharacterTextSplitter
I/O Contract
Inputs (BaseChunker.__init__)
| Name | Type | Required | Description |
|---|---|---|---|
| chunk_size | int | No | Maximum size of each chunk (default 1000) |
| chunk_overlap | int | No | Number of characters to overlap between chunks (default 200) |
| separators | None | No | List of separators in order of preference (default: ["\n\n", "\n", " ", ""]) |
| keep_separator | bool | No | Whether to keep the separator in the split text (default True) |
Inputs (BaseChunker.chunk)
| Name | Type | Required | Description |
|---|---|---|---|
| text | str | Yes | The text to chunk into smaller pieces |
Outputs
| Name | Type | Description |
|---|---|---|
| return | list[str] | A list of text chunks. Returns empty list for empty or whitespace-only input. |
Usage Examples
Basic Usage
from crewai_tools.rag.chunkers.base_chunker import BaseChunker
# Create a chunker with default settings
chunker = BaseChunker()
# Chunk a long document
text = "First paragraph.\n\nSecond paragraph with more content.\n\nThird paragraph."
chunks = chunker.chunk(text)
# Returns chunks respecting paragraph boundaries with overlap
# Create a chunker with custom settings
chunker = BaseChunker(chunk_size=500, chunk_overlap=50)
chunks = chunker.chunk(text)
Using RecursiveCharacterTextSplitter Directly
from crewai_tools.rag.chunkers.base_chunker import RecursiveCharacterTextSplitter
splitter = RecursiveCharacterTextSplitter(
chunk_size=4000,
chunk_overlap=200,
separators=["\n\n", "\n", ". ", " ", ""],
keep_separator=True,
)
chunks = splitter.split_text(long_document)