Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Heuristic:FlowiseAI Flowise Document Loader Bypass Optimization

From Leeroopedia
Knowledge Sources
Domains Optimization, Document_Processing
Last Updated 2026-02-12 07:30 GMT

Overview

Performance optimization that skips redundant document loading when rebuilding chatflows that have already upserted their documents to a vector store.

Description

When a chatflow graph is rebuilt (e.g., during a prediction request), Flowise normally initializes all nodes in the graph, including document loaders. However, if a document loader is connected to a vector store and the documents have already been upserted, re-loading and re-processing the documents is wasteful and slow. This optimization detects this pattern and bypasses the document loader initialization. The exception is in-memory vector stores (like MemoryVectorStore), which lose their data between rebuilds and must re-load documents every time.

Usage

This heuristic applies automatically during chatflow prediction requests. Understanding it is important when debugging unexpected document loading behavior, troubleshooting missing documents in memory vector stores, or building custom vector store nodes.

The Insight (Rule of Thumb)

  • Action: Document loaders connected to persistent vector stores are automatically skipped during flow rebuilds
  • Value: Saves time proportional to document size and processing complexity on every prediction request
  • Trade-off: If documents change after initial upsert, they will NOT be re-loaded automatically during predictions. Users must manually re-upsert to pick up changes.
  • Exception: Memory vector stores (non-persistent) always trigger document re-loading because their data is ephemeral

Reasoning

Document loading can be expensive: reading files from disk, fetching URLs, parsing PDFs, splitting text into chunks, and computing embeddings. For persistent vector stores (Pinecone, Qdrant, Chroma, FAISS on disk, etc.), this work is done once during the upsert step. Re-doing it on every prediction would add significant latency without benefit. The bypass check follows the graph edges from the document loader output to determine if the downstream node is a vector store. If it is (and it is not a memory vector store), the loader is marked as "should be ignored" during flow construction.

This is documented as temporary technical debt with a TODO comment indicating it should be removed when document loader nodes are decoupled from the main canvas.

Code Evidence

Bypass logic from `packages/server/src/utils/index.ts:439-464`:

/**
 * Check if doc loader should be bypassed, ONLY if doc loader is connected to a vector store
 * Reason being we dont want to load the doc loader again whenever we are building the flow,
 * because it was already done during upserting
 * EXCEPT if the vector store is a memory vector store
 * TODO: Remove this logic when we remove doc loader nodes from the canvas
 */
const checkIfDocLoaderShouldBeIgnored = (
    reactFlowNode: IReactFlowNode,
    reactFlowNodes: IReactFlowNode[],
    reactFlowEdges: IReactFlowEdge[]
): boolean => {
    let outputId = ''
    if (reactFlowNode.data.outputAnchors.length) {
        if (Object.keys(reactFlowNode.data.outputs || {}).length) {
            const output = reactFlowNode.data.outputs?.output
            const node = reactFlowNode.data.outputAnchors[0].options?.find(
                (anchor) => anchor.name === output
            )
            if (node) outputId = (node as ICommonObject).id
        } else {
            outputId = (reactFlowNode.data.outputAnchors[0] as ICommonObject).id
        }
    }
    // ... follows edge to check if downstream is a vector store
}

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment