Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Principle:Langgenius Dify Datasource Node Configuration

From Leeroopedia
Knowledge Sources Dify
Domains RAG, Pipeline, Frontend
Last Updated 2026-02-12 00:00 GMT

Overview

Description

Datasource Node Configuration is the principle governing how RAG pipelines in Dify connect to and configure their data ingestion sources. The datasource node is the entry point of every RAG pipeline -- it defines where documents originate before they flow through processing, embedding, and indexing stages.

Dify supports multiple datasource types through a plugin-based architecture:

  • Local File (local_file) -- Documents uploaded directly from the user's machine.
  • Online Document (online_document) -- Content fetched from cloud-based document platforms (e.g., Notion pages).
  • Website Crawl (website_crawl) -- Content extracted from web pages via crawling.
  • Online Drive (online_drive) -- Files accessed from cloud storage services (e.g., S3 buckets, Google Drive).

Each datasource type is provided by a datasource plugin that must be installed and configured with appropriate credentials before it can be used in a pipeline.

Usage

Datasource Node Configuration is relevant whenever a user:

  • Configures the first node of a RAG pipeline to specify the document ingestion source.
  • Browses available datasource plugins to understand which integrations are installed.
  • Provides authentication credentials for a datasource plugin (e.g., API keys, OAuth tokens).
  • Updates or rotates credentials for an existing datasource connection.
  • Previews content from an online document source before committing to a full pipeline run.

The configuration must be completed before the pipeline can be executed, as the datasource node feeds raw document data to all downstream processing nodes.

Theoretical Basis

The datasource node configuration follows the Plugin Architecture pattern, where each data source is an independently installable and configurable extension. This design provides several advantages:

  • Extensibility -- New datasource types can be added without modifying core pipeline logic.
  • Credential Isolation -- Each plugin manages its own authentication independently, following the principle of least privilege.
  • Loose Coupling -- The pipeline graph treats all datasource nodes uniformly through a common interface (DataSourceItem), regardless of the underlying source type.

The credential management layer uses a dedicated authentication endpoint (/auth/plugin/datasource) separate from the pipeline API, enforcing a clean separation between what data to fetch (pipeline configuration) and how to authenticate (credential management). This separation aligns with the Separation of Concerns principle in software architecture.

The staleTime: 0 setting on datasource queries ensures that plugin availability is always fresh, reflecting the dynamic nature of plugin installations and credential states.

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment