Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Principle:FlowiseAI Flowise Document Loader Selection

From Leeroopedia
Attribute Value
Sources packages/ui/src/api/documentstore.js
Domains Document_Store_Ingestion
Last Updated 2026-02-12 14:00 GMT

Overview

Document_Loader_Selection is a technique for selecting and configuring document loader components that extract text from various file formats and sources. Loaders are the entry point for content ingestion in the RAG pipeline, converting raw documents into structured text representations that can be chunked, embedded, and stored.

Description

Document loaders are components that read documents from different sources (files, URLs, APIs) and different formats (PDF, CSV, JSON, text). The loader selection step fetches all available loader component definitions from the server and presents them for user selection.

The loader selection process involves:

  • Discovering available loaders -- The system queries the server for all registered loader components, each of which defines its own capabilities and configuration schema.
  • Evaluating input requirements -- Each loader specifies its input parameters (e.g., file path, URL, API key), credential requirements, and supported file formats.
  • Configuring the selected loader -- After choosing a loader, the user fills in its specific parameters based on the component's input schema.
  • Associating with a document store -- The configured loader is linked to a specific document store for processing.

Common loader types include:

  • File-based loaders -- PDF Loader, CSV Loader, DOCX Loader, Text File Loader
  • Web-based loaders -- Cheerio Web Scraper, Puppeteer Web Scraper, Sitemap Loader
  • API-based loaders -- Notion Loader, Confluence Loader, GitHub Loader
  • Structured data loaders -- JSON Loader, API Loader

Usage

Use document loader selection when adding a document source to a document store for text extraction. Typical scenarios include:

  • File ingestion -- Uploading and processing PDF reports, CSV data files, or text documents.
  • Web scraping -- Configuring a web scraper to extract content from documentation sites or knowledge bases.
  • API integration -- Connecting to external services (Notion, Confluence) to pull in content for RAG retrieval.
// Fetching all available document loaders
const response = await documentStoreApi.getDocumentLoaders()
const loaders = response.data
// Each loader has: name, label, icon, description, inputParams, inputAnchors
loaders.forEach(loader => {
    console.log(loader.label, '-', loader.description)
})

Theoretical Basis

Document loader selection follows a component registry pattern where available loader implementations are fetched from the server. This architecture provides several design advantages:

  • Extensibility -- New loader types can be added to the server without modifying the frontend. The UI dynamically renders configuration forms based on the component's declared input parameters.
  • Self-describing components -- Each loader defines its own configuration schema via inputParams (form fields) and inputAnchors (connections to other components like text splitters). This eliminates the need for hardcoded forms per loader type.
  • Credential abstraction -- Loaders that require authentication (API keys, OAuth tokens) declare credential requirements in their schema, enabling the system to present appropriate credential selection UI.
  • Uniform interface -- Despite different underlying implementations (file readers, HTTP clients, API SDKs), all loaders present a uniform interface to the document processing pipeline: they accept configuration and produce document objects with pageContent and metadata.

This pattern is a specialization of the Strategy pattern -- the loader selection determines which text extraction strategy is used, while the downstream pipeline (splitting, embedding, storage) remains agnostic to the source format.

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment