Implementation:Run llama Llama index SentenceSplitter Configuration

Knowledge Sources	LlamaIndex
Domains	Data_Preprocessing, RAG, NLP
Last Updated	2026-02-11 00:00 GMT

Overview

The SentenceSplitter is LlamaIndex's default text chunking implementation that splits text at sentence boundaries while respecting configurable chunk size and overlap constraints.

Description

SentenceSplitter extends MetadataAwareTextSplitter and performs sentence-aware splitting using a configurable tokenizer (defaults to NLTK's PunktSentenceTokenizer via the nltk package). It first splits text into sentences, then combines consecutive sentences into chunks that fit within the chunk_size limit. When a single sentence exceeds the limit, it falls back to splitting by paragraph separators or a secondary regex pattern.

The metadata-aware variant (split_text_metadata_aware) accounts for metadata string length when calculating effective chunk size, ensuring the final node (text + metadata) fits within limits.

Usage

Use SentenceSplitter as the default node parser for most RAG pipelines. Configure chunk_size and chunk_overlap based on your embedding model's context window and retrieval granularity requirements.

Code Reference

Source Location

Repository: llama_index
File: llama-index-core/llama_index/core/node_parser/text/sentence.py
Lines: L34-331

Signature

class SentenceSplitter(MetadataAwareTextSplitter):
    def __init__(
        self,
        separator: str = " ",
        chunk_size: int = DEFAULT_CHUNK_SIZE,
        chunk_overlap: int = SENTENCE_CHUNK_OVERLAP,
        tokenizer: Optional[Callable] = None,
        paragraph_separator: str = DEFAULT_PARAGRAPH_SEP,
        chunking_tokenizer_fn: Optional[Callable[[str], List[str]]] = None,
        secondary_chunking_regex: Optional[str] = None,
        include_metadata: bool = True,
        include_prev_next_rel: bool = True,
    ) -> None:

Key Methods

def split_text(self, text: str) -> List[str]:
    """Split text into chunks respecting sentence boundaries."""

def split_text_metadata_aware(
    self, text: str, metadata_str: str
) -> List[str]:
    """Split text accounting for metadata string length."""

Import

from llama_index.core.node_parser import SentenceSplitter

I/O Contract

Inputs

Name	Type	Required	Description
separator	str	No (default: " ")	Character used for splitting within sentences when they exceed chunk_size
chunk_size	int	No (default: DEFAULT_CHUNK_SIZE)	Maximum number of tokens per chunk
chunk_overlap	int	No (default: SENTENCE_CHUNK_OVERLAP)	Number of overlapping tokens between consecutive chunks
tokenizer	Optional[Callable]	No	Custom tokenizer function for counting tokens
paragraph_separator	str	No (default: "\n\n\n")	Separator used for paragraph-level splitting
chunking_tokenizer_fn	Optional[Callable]	No	Custom function for splitting text into sentences
secondary_chunking_regex	Optional[str]	No	Regex pattern for secondary splitting when sentences are too long
include_metadata	bool	No (default: True)	Whether to include node metadata in output
include_prev_next_rel	bool	No (default: True)	Whether to set prev/next relationships between nodes

Outputs

Name	Type	Description
return (split_text)	List[str]	List of text chunks split at sentence boundaries
return (split_text_metadata_aware)	List[str]	List of text chunks accounting for metadata length

Usage Examples

Basic Sentence Splitting

from llama_index.core.node_parser import SentenceSplitter

splitter = SentenceSplitter(
    chunk_size=1024,
    chunk_overlap=200,
)

# Split raw text
chunks = splitter.split_text("Long document text with many sentences...")

Using as a Pipeline Transformation

from llama_index.core.node_parser import SentenceSplitter
from llama_index.core.ingestion import IngestionPipeline

splitter = SentenceSplitter(
    chunk_size=512,
    chunk_overlap=50,
    paragraph_separator="\n\n",
)

pipeline = IngestionPipeline(
    transformations=[splitter],
)

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment