Workflow:FlowiseAI Flowise Document Store Ingestion
| Knowledge Sources | |
|---|---|
| Domains | RAG, Vector_Stores, Data_Engineering, LLM_Ops |
| Last Updated | 2026-02-12 07:30 GMT |
Overview
End-to-end process for ingesting documents into a vector store through the Flowise Document Store pipeline, from upload through chunking, embedding, and retrieval testing.
Description
This workflow covers the complete document ingestion pipeline in Flowise. A Document Store is a managed container that orchestrates document loading, text splitting (chunking), vector embedding, and upsert into a vector database. The pipeline supports multiple document loader types (PDF, web pages, Word documents, etc.), configurable text splitters, a range of embedding providers (OpenAI, Cohere, HuggingFace, etc.), and multiple vector store backends (Pinecone, Weaviate, Milvus, FAISS, etc.). An optional Record Manager prevents duplicate insertions. The result is a queryable vector store suitable for retrieval-augmented generation (RAG) pipelines.
Usage
Execute this workflow when you have domain-specific documents (PDFs, web pages, text files, etc.) that need to be made searchable via semantic similarity for use in RAG chatflows. This is the prerequisite step before connecting a vector store retriever node in a chatflow canvas. The output is a populated vector store with embedded document chunks that can be queried by natural language.
Execution Steps
Step 1: Create Document Store
Navigate to the Document Stores page and click "Add New". Provide a name and optional description for the store. The system creates a new Document Store record in the database with an initial status of STALE.
Key considerations:
- Document stores are scoped to the current workspace
- The name is required; description is optional
- The store serves as a container that will hold one or more document loaders
Step 2: Add Document Loader
From the Document Store detail page, click "Add Document Loader". A dialog presents all available loader types (PDF Loader, Web Crawler, Word Document Loader, CSV Loader, etc.). Select the appropriate loader for your document source. The system navigates to the loader configuration page.
Key considerations:
- Each document store can have multiple document loaders
- Loaders are implemented as Flowise component nodes
- Different loaders require different inputs (file paths, URLs, API credentials)
Step 3: Configure Loader and Text Splitter
On the loader configuration page, fill in the loader-specific parameters (file upload, URL, credentials, etc.). Then select and configure a text splitter to control how documents are chunked. Configure chunk size, chunk overlap, and other splitter-specific parameters.
Key considerations:
- Text splitter options include RecursiveCharacterTextSplitter, TokenTextSplitter, and others
- Chunk size and overlap significantly affect retrieval quality
- Credential inputs connect to the centralized credential store
- All mandatory fields must be completed before proceeding
Step 4: Preview Chunks
Click "Preview Chunks" to see how the documents will be split before committing to processing. The system loads the documents, applies the text splitter, and returns a preview of the first N chunks. Each chunk displays its text content and metadata (source, page number, etc.).
Key considerations:
- Preview runs the full loader and splitter pipeline on a limited set
- Users can review chunk quality and adjust splitter settings before processing
- The preview shows total chunk count and individual chunk content
- Chunks can be expanded for detailed inspection
Step 5: Process and Save Documents
Click "Save & Process" to persist the loader configuration and trigger background document processing. The system saves the loader record, then asynchronously loads all documents, applies the text splitter, and stores every chunk in the database with metadata. The user is redirected immediately while processing continues in the background.
Key considerations:
- Processing runs asynchronously; the user does not need to wait
- Each chunk is stored as a DocumentStoreFileChunk entity
- The document store status transitions from STALE to PROCESSING
- Metadata including source file, page number, and custom fields is preserved
Step 6: Review and Edit Chunks
Once processing completes, navigate to the chunks view to review stored chunks. Chunks are displayed in a paginated grid (50 per page). Individual chunks can be expanded, edited (modify content or metadata), or deleted.
Key considerations:
- Chunk editing allows manual correction before embedding
- Deleted chunks are permanently removed from the store
- Pagination supports large document sets efficiently
Step 7: Configure Vector Store and Embeddings
Navigate to the vector store configuration page. This is a three-step stepper interface. First, select and configure an embeddings provider (model, API key). Second, select and configure a vector store backend (connection details, index name, similarity metric). Third, optionally configure a Record Manager to track which chunks have been upserted and prevent duplicates.
Key considerations:
- Embeddings provider converts text to numerical vectors
- Vector store backend stores and indexes the embeddings
- Record Manager is optional but recommended for incremental updates
- All three configurations require valid credentials
Step 8: Upsert Chunks to Vector Store
Click "Upsert All Chunks" to embed and insert all document chunks into the vector store. For each chunk, the system generates an embedding vector using the configured provider, then upserts the chunk with its embedding into the vector store. The Record Manager (if configured) tracks inserted chunks to prevent duplicates on subsequent upserts.
Key considerations:
- Upsert tracks statistics: numAdded, numUpdated, numSkipped, numDeleted
- An UpsertHistory record is created for audit purposes
- The document store status transitions to UPSERTED on completion
- Large document sets may require significant processing time
Step 9: Test Retrieval with Query
Once upserted, navigate to the Query tab. Enter a natural language query to test semantic retrieval. The system generates an embedding for the query, searches the vector store for similar chunks, and returns matching results with similarity scores and metadata.
Key considerations:
- Query testing validates the end-to-end pipeline before use in chatflows
- Results include chunk content, similarity score, and metadata
- Response time is displayed for performance assessment
- Vector store parameters can be tuned from the query interface