Principle:FlowiseAI Flowise Document Processing
| Attribute | Value |
|---|---|
| Sources | packages/ui/src/api/documentstore.js |
| Domains | Document_Store_Ingestion |
| Last Updated | 2026-02-12 14:00 GMT |
Overview
Document_Processing is a technique for saving document loader configuration and triggering asynchronous background processing of documents into stored chunks. This two-phase approach decouples configuration persistence from execution, ensuring robustness for long-running document processing tasks.
Description
After previewing confirms acceptable chunking, the system saves the loader configuration and initiates asynchronous processing. This two-phase approach (save config, then process) ensures configuration is persisted even if processing is long-running. Processing extracts text, splits into chunks, and stores them for later embedding and upsert to vector stores.
The document processing workflow consists of two distinct phases:
- Phase 1: Save configuration -- The
saveProcessingLoadercall persists the complete loader and splitter configuration to the database. This creates (or updates) a loader record associated with the document store. The server returns the loader's unique identifier. - Phase 2: Trigger processing -- The
processLoadercall initiates asynchronous background processing using the saved configuration. The server begins extracting text from the document source, splitting it into chunks, and storing the chunks in the database.
Key characteristics of this design:
- Configuration durability -- Saving configuration first ensures it survives server restarts or processing failures. The user does not need to re-enter configuration if processing needs to be retried.
- Asynchronous execution -- Processing runs in the background, allowing the UI to remain responsive. The document store status updates to reflect processing progress.
- Idempotent re-processing -- Because configuration is persisted separately, the same loader can be re-processed (e.g., after document updates) without reconfiguration.
Usage
Use document processing when committing to full document processing after preview validation. Typical scenarios include:
- First-time ingestion -- Processing a newly configured document source after validating the preview.
- Re-processing after updates -- Re-running processing when the source document has been updated.
- Retry after failure -- Re-triggering processing if a previous attempt failed due to transient errors.
// Phase 1: Save the loader configuration
const saveResponse = await documentStoreApi.saveProcessingLoader({
storeId: 'store-123',
loaderId: '',
loaderName: 'pdfLoader',
loaderConfig: { pdfFile: fileId },
splitterId: 'recursiveCharacterTextSplitter',
splitterConfig: { chunkSize: 1000, chunkOverlap: 200 }
})
const loaderId = saveResponse.data.id
// Phase 2: Trigger asynchronous processing
await documentStoreApi.processLoader(
{ storeId: 'store-123' },
loaderId
)
// Processing now runs in the background
Theoretical Basis
Document processing implements a two-phase commit pattern that decouples configuration from execution:
- Durability before execution -- By persisting configuration before triggering processing, the system ensures that no work is lost. If processing fails (network error, server crash, API rate limit), the configuration remains available for retry without user re-entry.
- Separation of synchronous and asynchronous operations -- The save operation is synchronous and fast (database write). The process operation triggers an asynchronous background job that may take seconds to hours depending on document size. This separation keeps the UI responsive and provides a clear boundary between "configured" and "processing" states.
- State machine progression -- Document store loaders progress through a state machine: configured (after save) -> processing (after process trigger) -> completed (after successful processing) -> ready for upsert. Each state transition is explicit and observable.
- Retry semantics -- The two-phase design naturally supports retry: if processing fails, the system can re-trigger
processLoaderwith the same loader ID without needing to re-save configuration. This is especially important for large document sets where processing may take significant time.
This pattern is analogous to job scheduling systems where job definitions are persisted separately from job execution, enabling monitoring, retry, and audit capabilities.
Related Pages
- Implementation:FlowiseAI_Flowise_SaveProcessingLoader
- Principle:FlowiseAI_Flowise_Chunk_Preview -- Previous step: previewing chunks before committing
- Principle:FlowiseAI_Flowise_Chunk_Management -- Next step: reviewing and editing processed chunks
- Principle:FlowiseAI_Flowise_Vector_Store_Upsert -- Downstream: upserting processed chunks to vector stores