Principle:FlowiseAI Flowise Document Loader Selection
| Attribute | Value |
|---|---|
| Sources | packages/ui/src/api/documentstore.js |
| Domains | Document_Store_Ingestion |
| Last Updated | 2026-02-12 14:00 GMT |
Overview
Document_Loader_Selection is a technique for selecting and configuring document loader components that extract text from various file formats and sources. Loaders are the entry point for content ingestion in the RAG pipeline, converting raw documents into structured text representations that can be chunked, embedded, and stored.
Description
Document loaders are components that read documents from different sources (files, URLs, APIs) and different formats (PDF, CSV, JSON, text). The loader selection step fetches all available loader component definitions from the server and presents them for user selection.
The loader selection process involves:
- Discovering available loaders -- The system queries the server for all registered loader components, each of which defines its own capabilities and configuration schema.
- Evaluating input requirements -- Each loader specifies its input parameters (e.g., file path, URL, API key), credential requirements, and supported file formats.
- Configuring the selected loader -- After choosing a loader, the user fills in its specific parameters based on the component's input schema.
- Associating with a document store -- The configured loader is linked to a specific document store for processing.
Common loader types include:
- File-based loaders -- PDF Loader, CSV Loader, DOCX Loader, Text File Loader
- Web-based loaders -- Cheerio Web Scraper, Puppeteer Web Scraper, Sitemap Loader
- API-based loaders -- Notion Loader, Confluence Loader, GitHub Loader
- Structured data loaders -- JSON Loader, API Loader
Usage
Use document loader selection when adding a document source to a document store for text extraction. Typical scenarios include:
- File ingestion -- Uploading and processing PDF reports, CSV data files, or text documents.
- Web scraping -- Configuring a web scraper to extract content from documentation sites or knowledge bases.
- API integration -- Connecting to external services (Notion, Confluence) to pull in content for RAG retrieval.
// Fetching all available document loaders
const response = await documentStoreApi.getDocumentLoaders()
const loaders = response.data
// Each loader has: name, label, icon, description, inputParams, inputAnchors
loaders.forEach(loader => {
console.log(loader.label, '-', loader.description)
})
Theoretical Basis
Document loader selection follows a component registry pattern where available loader implementations are fetched from the server. This architecture provides several design advantages:
- Extensibility -- New loader types can be added to the server without modifying the frontend. The UI dynamically renders configuration forms based on the component's declared input parameters.
- Self-describing components -- Each loader defines its own configuration schema via
inputParams(form fields) andinputAnchors(connections to other components like text splitters). This eliminates the need for hardcoded forms per loader type. - Credential abstraction -- Loaders that require authentication (API keys, OAuth tokens) declare credential requirements in their schema, enabling the system to present appropriate credential selection UI.
- Uniform interface -- Despite different underlying implementations (file readers, HTTP clients, API SDKs), all loaders present a uniform interface to the document processing pipeline: they accept configuration and produce document objects with
pageContentandmetadata.
This pattern is a specialization of the Strategy pattern -- the loader selection determines which text extraction strategy is used, while the downstream pipeline (splitting, embedding, storage) remains agnostic to the source format.
Related Pages
- Implementation:FlowiseAI_Flowise_GetDocumentLoaders
- Principle:FlowiseAI_Flowise_Document_Store_Creation -- Prerequisite: creating the store to add loaders to
- Principle:FlowiseAI_Flowise_Text_Splitter_Configuration -- Next step: configuring text splitting for the loaded documents
- Principle:FlowiseAI_Flowise_Chunk_Preview -- Previewing results of loader and splitter configuration