Heuristic:Run llama Llama index Chunk Size Optimization

Knowledge Sources	LlamaIndex Core Default constants analysis
Domains	RAG, Optimization, Text_Processing
Last Updated	2026-02-11 19:00 GMT

Overview

Chunk size and overlap configuration strategy for balancing retrieval precision, context preservation, and embedding quality in LlamaIndex's SentenceSplitter.

Description

The `SentenceSplitter` splits documents into chunks for embedding and retrieval. The chunk_size controls the maximum tokens per chunk, while chunk_overlap controls how many tokens overlap between consecutive chunks to preserve context at boundaries. LlamaIndex defaults to 1024 tokens with 200 token overlap for the SentenceSplitter (vs 20 token overlap for other splitters). The splitter also has a metadata-aware mode that subtracts metadata token length from the effective chunk size, which can cause issues if metadata is large.

Usage

Apply this heuristic when configuring text chunking for the Data Ingestion Pipeline or RAG Query Pipeline. Especially relevant when:

Retrieval results feel too fragmented or lack context
Metadata-rich documents produce tiny chunks
Embedding quality needs tuning for specific document types

The Insight (Rule of Thumb)

Action: Choose chunk_size based on document type and query patterns.
Small chunks (256-512): Better for precise, fact-based retrieval. Each chunk covers one concept.
Large chunks (1024-2048): Better for context-rich retrieval. Each chunk covers multiple related concepts.
Default (1024): Good general-purpose setting for mixed document types.
Overlap (200): SentenceSplitter default. Ensures sentence boundaries are preserved across chunks.
Metadata Warning: If `chunk_size - metadata_length < 50`, chunks become too small for meaningful embedding. Keep metadata compact or increase chunk_size.
Trade-off: Smaller chunks = more precise retrieval but more API calls for embedding. Larger chunks = fewer API calls but potentially diluted relevance.

Reasoning

The default chunk size of 1024 tokens was chosen to balance several constraints:

Context Window: The default LLM context window is 3900 tokens (`DEFAULT_CONTEXT_WINDOW`). With `DEFAULT_SIMILARITY_TOP_K = 2`, retrieving 2 chunks of 1024 tokens each uses ~2048 tokens, leaving room for the prompt template and response.

Embedding Quality: Embedding models (like text-embedding-ada-002 with 1536 dimensions) perform best on text passages that are semantically coherent. The SentenceSplitter preserves complete sentences at boundaries, which the comment in the source explicitly notes: "Parse text with a preference for complete sentences... there are less likely to be hanging sentences or parts of sentences at the end of the node chunk."

Metadata-Aware Splitting: The code explicitly warns when effective chunk size drops below 50 tokens after subtracting metadata length, indicating this is a known failure mode.

Code evidence from `constants.py:10-12`:

DEFAULT_CHUNK_SIZE = 1024  # tokens
DEFAULT_CHUNK_OVERLAP = 20  # tokens
DEFAULT_SIMILARITY_TOP_K = 2

SentenceSplitter-specific overlap from `sentence.py:22`:

SENTENCE_CHUNK_OVERLAP = 200

Metadata warning from `sentence.py:165-172`:

elif effective_chunk_size < 50:
    print(
        f"Metadata length ({metadata_len}) is close to chunk size "
        f"({self.chunk_size}). Resulting chunks are less than 50 tokens. "
        "Consider increasing the chunk size or decreasing the size of "
        "your metadata to avoid this.",
        flush=True,
    )

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment