Heuristic:FlowiseAI Flowise Document Loader Bypass Optimization
| Knowledge Sources | |
|---|---|
| Domains | Optimization, Document_Processing |
| Last Updated | 2026-02-12 07:30 GMT |
Overview
Performance optimization that skips redundant document loading when rebuilding chatflows that have already upserted their documents to a vector store.
Description
When a chatflow graph is rebuilt (e.g., during a prediction request), Flowise normally initializes all nodes in the graph, including document loaders. However, if a document loader is connected to a vector store and the documents have already been upserted, re-loading and re-processing the documents is wasteful and slow. This optimization detects this pattern and bypasses the document loader initialization. The exception is in-memory vector stores (like MemoryVectorStore), which lose their data between rebuilds and must re-load documents every time.
Usage
This heuristic applies automatically during chatflow prediction requests. Understanding it is important when debugging unexpected document loading behavior, troubleshooting missing documents in memory vector stores, or building custom vector store nodes.
The Insight (Rule of Thumb)
- Action: Document loaders connected to persistent vector stores are automatically skipped during flow rebuilds
- Value: Saves time proportional to document size and processing complexity on every prediction request
- Trade-off: If documents change after initial upsert, they will NOT be re-loaded automatically during predictions. Users must manually re-upsert to pick up changes.
- Exception: Memory vector stores (non-persistent) always trigger document re-loading because their data is ephemeral
Reasoning
Document loading can be expensive: reading files from disk, fetching URLs, parsing PDFs, splitting text into chunks, and computing embeddings. For persistent vector stores (Pinecone, Qdrant, Chroma, FAISS on disk, etc.), this work is done once during the upsert step. Re-doing it on every prediction would add significant latency without benefit. The bypass check follows the graph edges from the document loader output to determine if the downstream node is a vector store. If it is (and it is not a memory vector store), the loader is marked as "should be ignored" during flow construction.
This is documented as temporary technical debt with a TODO comment indicating it should be removed when document loader nodes are decoupled from the main canvas.
Code Evidence
Bypass logic from `packages/server/src/utils/index.ts:439-464`:
/**
* Check if doc loader should be bypassed, ONLY if doc loader is connected to a vector store
* Reason being we dont want to load the doc loader again whenever we are building the flow,
* because it was already done during upserting
* EXCEPT if the vector store is a memory vector store
* TODO: Remove this logic when we remove doc loader nodes from the canvas
*/
const checkIfDocLoaderShouldBeIgnored = (
reactFlowNode: IReactFlowNode,
reactFlowNodes: IReactFlowNode[],
reactFlowEdges: IReactFlowEdge[]
): boolean => {
let outputId = ''
if (reactFlowNode.data.outputAnchors.length) {
if (Object.keys(reactFlowNode.data.outputs || {}).length) {
const output = reactFlowNode.data.outputs?.output
const node = reactFlowNode.data.outputAnchors[0].options?.find(
(anchor) => anchor.name === output
)
if (node) outputId = (node as ICommonObject).id
} else {
outputId = (reactFlowNode.data.outputAnchors[0] as ICommonObject).id
}
}
// ... follows edge to check if downstream is a vector store
}