Principle:FlowiseAI Flowise Text Splitter Configuration
| Attribute | Value |
|---|---|
| Sources | packages/ui/src/api/nodes.js |
| Domains | Document_Store_Ingestion |
| Last Updated | 2026-02-12 14:00 GMT |
Overview
Text_Splitter_Configuration is a technique for selecting and configuring text splitting strategies that break documents into semantically meaningful chunks for embedding. The quality of text splitting directly impacts retrieval accuracy in the RAG pipeline, making splitter configuration a critical optimization step.
Description
Text splitters divide large documents into smaller chunks suitable for embedding and retrieval. Different strategies exist for how documents are divided:
- Character-based splitting -- Splits text at fixed character counts. Simple but may break mid-sentence or mid-word.
- Recursive character splitting -- Splits on a hierarchy of separators (paragraphs, then sentences, then words) to preserve semantic boundaries. The most commonly used strategy.
- Token-based splitting -- Splits based on token count (aligned with the embedding model's tokenizer), ensuring chunks fit within embedding model context windows.
- Semantic splitting -- Uses embedding similarity to identify natural topic boundaries within the text.
Key configuration parameters include:
- Chunk size -- The maximum size of each chunk (in characters or tokens). Smaller chunks yield more precise retrieval but less context per result.
- Chunk overlap -- The number of characters/tokens shared between adjacent chunks. Overlap ensures that information at chunk boundaries is not lost.
- Separators -- For recursive splitting, the ordered list of separator characters (e.g.,
\n\n,\n,,).
Usage
Use text splitter configuration when configuring how documents should be chunked before embedding in the RAG pipeline. Typical scenarios include:
- Long-form documents -- Splitting research papers or technical documentation where preserving paragraph and section boundaries matters.
- Structured data -- Configuring splitters for code files, markdown, or HTML where format-aware splitting preserves structural semantics.
- Embedding model alignment -- Setting chunk sizes that align with the embedding model's optimal input length (e.g., 512 tokens for many models).
// Fetching available text splitter components
const response = await nodesApi.getNodesByCategory('Text Splitters')
const splitters = response.data
// Each splitter component has inputParams defining chunk_size, chunk_overlap, etc.
Theoretical Basis
Text splitter configuration is grounded in chunking strategy optimization for retrieval-augmented generation:
- Precision vs. context tradeoff -- Chunk size directly affects this tradeoff. Smaller chunks (100-200 tokens) yield more precise retrieval because each chunk covers a narrow topic, but individual results carry less context. Larger chunks (500-1000 tokens) provide more context per result but may dilute relevance with off-topic content.
- Boundary preservation -- Recursive splitting preserves semantic boundaries (paragraphs, sentences) better than fixed-size splitting. When a document is split at a paragraph boundary, each chunk is more likely to contain a complete thought, improving both embedding quality and retrieval relevance.
- Overlap for continuity -- Overlap ensures that information spanning chunk boundaries is captured in at least one chunk. A typical overlap of 10-20% of chunk size (e.g., 200-character overlap for 1000-character chunks) balances redundancy against coverage.
- Format-aware splitting -- Specialized splitters for code, markdown, or HTML use format-specific delimiters (function boundaries, heading levels, tag structures) to produce chunks that respect the document's inherent structure.
The optimal configuration depends on the specific use case, document characteristics, and embedding model, making the preview-then-commit workflow (see Principle:FlowiseAI_Flowise_Chunk_Preview) essential for iterative tuning.
Related Pages
- Implementation:FlowiseAI_Flowise_GetNodesByCategory
- Principle:FlowiseAI_Flowise_Document_Loader_Selection -- Previous step: selecting the document loader
- Principle:FlowiseAI_Flowise_Chunk_Preview -- Next step: previewing the resulting chunks
- Principle:FlowiseAI_Flowise_Vector_Store_Upsert -- Downstream: upserting chunks to vector stores