Principle:FlowiseAI Flowise Chunk Preview
| Attribute | Value |
|---|---|
| Sources | packages/ui/src/api/documentstore.js |
| Domains | Document_Store_Ingestion |
| Last Updated | 2026-02-12 14:00 GMT |
Overview
Chunk_Preview is a technique for previewing document chunks before committing to full processing, enabling iterative configuration of loader and splitter parameters. This preview-then-commit pattern is central to the document ingestion workflow, providing fast feedback on configuration quality.
Description
Before processing an entire document set, users preview a sample of chunks to verify their loader and splitter configuration produces good results. This preview-then-commit pattern saves time and resources by catching configuration issues early. The preview shows chunk content, metadata, and total chunk count.
The chunk preview workflow operates as follows:
- Configuration assembly -- The system gathers the complete configuration: store ID, loader type and config, optional splitter type and config, optional credentials, and the desired preview chunk count.
- Server-side processing -- The server executes the loader and splitter against the specified document source, processing only enough content to produce the requested number of preview chunks.
- Result inspection -- The user examines the preview chunks to verify content quality: Are chunks the right size? Are semantic boundaries preserved? Is metadata correct? Is the total chunk count reasonable?
- Iterative refinement -- If the preview is unsatisfactory, the user adjusts parameters (chunk size, overlap, separators, loader options) and previews again until the output meets quality standards.
The default preview count is typically 20 chunks, which provides a representative sample without incurring the cost of processing the entire document.
Usage
Use chunk preview when validating document chunking configuration before committing to full-scale processing. Typical scenarios include:
- Initial configuration -- Testing a new loader and splitter combination for the first time on a document source.
- Parameter tuning -- Adjusting chunk size and overlap to find the optimal balance between precision and context.
- Format validation -- Verifying that a specialized loader (PDF, CSV, HTML) correctly extracts content and preserves formatting.
- Metadata verification -- Checking that chunk metadata (source, page number, section) is populated correctly.
// Previewing chunks with a configured loader and splitter
const previewConfig = {
storeId: 'store-123',
loaderId: '',
loaderName: 'pdfLoader',
loaderConfig: { pdfFile: uploadedFileId },
splitterId: 'recursiveCharacterTextSplitter',
splitterConfig: { chunkSize: 1000, chunkOverlap: 200 },
previewChunkCount: 20
}
const response = await documentStoreApi.previewChunks(previewConfig)
console.log(`Total chunks: ${response.data.totalChunks}`)
response.data.chunks.forEach((chunk, i) => {
console.log(`Chunk ${i}: ${chunk.pageContent.substring(0, 100)}...`)
})
Theoretical Basis
Chunk preview implements a preview-commit pattern for iterative configuration:
- Fast feedback loop -- By processing a limited subset (default 20 chunks), the system provides near-instant feedback on configuration quality. Full processing of large document sets can take minutes or hours, making it impractical to iterate through trial-and-error on the full set.
- Cost optimization -- Preview avoids wasting compute resources (embedding generation, vector store writes) on poorly configured pipelines. The cost of generating embeddings for thousands of chunks is significant, especially with commercial embedding APIs.
- Representative sampling -- The preview sample is drawn from the beginning of the processed document, providing a representative view of how the loader and splitter handle the document's format and structure.
- Separation of concerns -- The preview operation is stateless -- it does not persist chunks or modify the document store. This clean separation between preview (read-only) and processing (write) operations enables safe experimentation.
- Human-in-the-loop validation -- The preview step embeds human judgment into the automated pipeline, allowing domain experts to validate that chunks are semantically coherent and appropriately sized for their specific retrieval use case.
This pattern is common in data pipeline tools where the cost of full execution is high and configuration has significant impact on output quality.
Related Pages
- Implementation:FlowiseAI_Flowise_PreviewChunks
- Principle:FlowiseAI_Flowise_Document_Loader_Selection -- Configuring the loader used in preview
- Principle:FlowiseAI_Flowise_Text_Splitter_Configuration -- Configuring the splitter used in preview
- Principle:FlowiseAI_Flowise_Document_Processing -- Next step: committing to full processing after preview