Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:CrewAIInc CrewAI RAG Base Chunker

From Leeroopedia
Knowledge Sources
Domains RAG, Text_Processing
Last Updated 2026-02-11 00:00 GMT

Overview

Implements a recursive text splitting algorithm that intelligently chunks text while preserving semantic boundaries and maintaining configurable overlap between chunks.

Description

This module contains two main classes that form the core chunking engine for the CrewAI RAG system.

RecursiveCharacterTextSplitter implements a hierarchical splitting strategy. It tries separators in descending order of granularity: double newlines (paragraph breaks), single newlines, spaces, and finally individual characters. When a chunk produced by one separator still exceeds the configured chunk_size, the splitter recursively applies the next finer-grained separator. The algorithm also maintains context through configurable chunk_overlap, keeping portions of the previous chunk in the next one. Separator preservation is controlled by the keep_separator flag. The merge step reassembles small splits into chunks close to the target size while respecting overlap constraints.

BaseChunker is a lightweight wrapper around RecursiveCharacterTextSplitter that provides default parameters (chunk_size=1000, chunk_overlap=200) and a simple chunk() interface. It also handles the edge case of empty or whitespace-only text by returning an empty list.

Usage

Import BaseChunker when you need to chunk text content for embedding. It is used internally by the RAG system's DataType registry, which instantiates chunkers for each data type. You can also use it directly by constructing a BaseChunker with custom parameters and calling its chunk() method.

Code Reference

Source Location

  • Repository: CrewAI
  • File: lib/crewai-tools/src/crewai_tools/rag/chunkers/base_chunker.py
  • Lines: 1-191

Signature

class RecursiveCharacterTextSplitter:
    def __init__(
        self,
        chunk_size: int = 4000,
        chunk_overlap: int = 200,
        separators: list[str] | None = None,
        keep_separator: bool = True,
    ) -> None: ...

    def split_text(self, text: str) -> list[str]: ...

class BaseChunker:
    def __init__(
        self,
        chunk_size: int = 1000,
        chunk_overlap: int = 200,
        separators: list[str] | None = None,
        keep_separator: bool = True,
    ) -> None: ...

    def chunk(self, text: str) -> list[str]: ...

Import

from crewai_tools.rag.chunkers.base_chunker import BaseChunker
from crewai_tools.rag.chunkers.base_chunker import RecursiveCharacterTextSplitter

I/O Contract

Inputs (BaseChunker.__init__)

Name Type Required Description
chunk_size int No Maximum size of each chunk (default 1000)
chunk_overlap int No Number of characters to overlap between chunks (default 200)
separators None No List of separators in order of preference (default: ["\n\n", "\n", " ", ""])
keep_separator bool No Whether to keep the separator in the split text (default True)

Inputs (BaseChunker.chunk)

Name Type Required Description
text str Yes The text to chunk into smaller pieces

Outputs

Name Type Description
return list[str] A list of text chunks. Returns empty list for empty or whitespace-only input.

Usage Examples

Basic Usage

from crewai_tools.rag.chunkers.base_chunker import BaseChunker

# Create a chunker with default settings
chunker = BaseChunker()

# Chunk a long document
text = "First paragraph.\n\nSecond paragraph with more content.\n\nThird paragraph."
chunks = chunker.chunk(text)
# Returns chunks respecting paragraph boundaries with overlap

# Create a chunker with custom settings
chunker = BaseChunker(chunk_size=500, chunk_overlap=50)
chunks = chunker.chunk(text)

Using RecursiveCharacterTextSplitter Directly

from crewai_tools.rag.chunkers.base_chunker import RecursiveCharacterTextSplitter

splitter = RecursiveCharacterTextSplitter(
    chunk_size=4000,
    chunk_overlap=200,
    separators=["\n\n", "\n", ". ", " ", ""],
    keep_separator=True,
)
chunks = splitter.split_text(long_document)

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment