Workflow:FlowiseAI Flowise Document Store Ingestion

Knowledge Sources	Flowise Flowise Docs
Domains	RAG, Vector_Stores, Data_Engineering, LLM_Ops
Last Updated	2026-02-12 07:30 GMT

Overview

End-to-end process for ingesting documents into a vector store through the Flowise Document Store pipeline, from upload through chunking, embedding, and retrieval testing.

Description

This workflow covers the complete document ingestion pipeline in Flowise. A Document Store is a managed container that orchestrates document loading, text splitting (chunking), vector embedding, and upsert into a vector database. The pipeline supports multiple document loader types (PDF, web pages, Word documents, etc.), configurable text splitters, a range of embedding providers (OpenAI, Cohere, HuggingFace, etc.), and multiple vector store backends (Pinecone, Weaviate, Milvus, FAISS, etc.). An optional Record Manager prevents duplicate insertions. The result is a queryable vector store suitable for retrieval-augmented generation (RAG) pipelines.

Usage

Execute this workflow when you have domain-specific documents (PDFs, web pages, text files, etc.) that need to be made searchable via semantic similarity for use in RAG chatflows. This is the prerequisite step before connecting a vector store retriever node in a chatflow canvas. The output is a populated vector store with embedded document chunks that can be queried by natural language.

Execution Steps

Step 1: Create Document Store

Navigate to the Document Stores page and click "Add New". Provide a name and optional description for the store. The system creates a new Document Store record in the database with an initial status of STALE.

Key considerations:

Document stores are scoped to the current workspace
The name is required; description is optional
The store serves as a container that will hold one or more document loaders

Step 2: Add Document Loader

From the Document Store detail page, click "Add Document Loader". A dialog presents all available loader types (PDF Loader, Web Crawler, Word Document Loader, CSV Loader, etc.). Select the appropriate loader for your document source. The system navigates to the loader configuration page.

Key considerations:

Each document store can have multiple document loaders
Loaders are implemented as Flowise component nodes
Different loaders require different inputs (file paths, URLs, API credentials)

Step 3: Configure Loader and Text Splitter

On the loader configuration page, fill in the loader-specific parameters (file upload, URL, credentials, etc.). Then select and configure a text splitter to control how documents are chunked. Configure chunk size, chunk overlap, and other splitter-specific parameters.

Key considerations:

Text splitter options include RecursiveCharacterTextSplitter, TokenTextSplitter, and others
Chunk size and overlap significantly affect retrieval quality
Credential inputs connect to the centralized credential store
All mandatory fields must be completed before proceeding

Step 4: Preview Chunks

Click "Preview Chunks" to see how the documents will be split before committing to processing. The system loads the documents, applies the text splitter, and returns a preview of the first N chunks. Each chunk displays its text content and metadata (source, page number, etc.).

Key considerations:

Preview runs the full loader and splitter pipeline on a limited set
Users can review chunk quality and adjust splitter settings before processing
The preview shows total chunk count and individual chunk content
Chunks can be expanded for detailed inspection

Step 5: Process and Save Documents

Click "Save & Process" to persist the loader configuration and trigger background document processing. The system saves the loader record, then asynchronously loads all documents, applies the text splitter, and stores every chunk in the database with metadata. The user is redirected immediately while processing continues in the background.

Key considerations:

Processing runs asynchronously; the user does not need to wait
Each chunk is stored as a DocumentStoreFileChunk entity
The document store status transitions from STALE to PROCESSING
Metadata including source file, page number, and custom fields is preserved

Step 6: Review and Edit Chunks

Once processing completes, navigate to the chunks view to review stored chunks. Chunks are displayed in a paginated grid (50 per page). Individual chunks can be expanded, edited (modify content or metadata), or deleted.

Key considerations:

Chunk editing allows manual correction before embedding
Deleted chunks are permanently removed from the store
Pagination supports large document sets efficiently

Step 7: Configure Vector Store and Embeddings

Navigate to the vector store configuration page. This is a three-step stepper interface. First, select and configure an embeddings provider (model, API key). Second, select and configure a vector store backend (connection details, index name, similarity metric). Third, optionally configure a Record Manager to track which chunks have been upserted and prevent duplicates.

Key considerations:

Embeddings provider converts text to numerical vectors
Vector store backend stores and indexes the embeddings
Record Manager is optional but recommended for incremental updates
All three configurations require valid credentials

Step 8: Upsert Chunks to Vector Store

Click "Upsert All Chunks" to embed and insert all document chunks into the vector store. For each chunk, the system generates an embedding vector using the configured provider, then upserts the chunk with its embedding into the vector store. The Record Manager (if configured) tracks inserted chunks to prevent duplicates on subsequent upserts.

Key considerations:

Upsert tracks statistics: numAdded, numUpdated, numSkipped, numDeleted
An UpsertHistory record is created for audit purposes
The document store status transitions to UPSERTED on completion
Large document sets may require significant processing time

Step 9: Test Retrieval with Query

Once upserted, navigate to the Query tab. Enter a natural language query to test semantic retrieval. The system generates an embedding for the query, searches the vector store for similar chunks, and returns matching results with similarity scores and metadata.

Key considerations:

Query testing validates the end-to-end pipeline before use in chatflows
Results include chunk content, similarity score, and metadata
Response time is displayed for performance assessment
Vector store parameters can be tuned from the query interface

Execution Diagram

GitHub URL

Workflow Repository