Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Workflow:Apache Dolphinscheduler Datasource Plugin Development

From Leeroopedia


Knowledge Sources
Domains Data_Engineering, Plugin_Architecture, Database_Integration
Last Updated 2026-02-10 10:00 GMT

Overview

End-to-end process for developing a new datasource plugin for Apache DolphinScheduler using the SPI-based plugin architecture to support an additional database type.

Description

This workflow outlines the standard procedure for extending DolphinScheduler's datasource plugin system to support a new database type. DolphinScheduler uses a Java SPI (Service Provider Interface) architecture where each database type is a self-contained plugin module. The process covers creating the required classes (Processor, Channel, ChannelFactory, Clients, Connection Parameters, and DTO), implementing validation and connection logic, and registering the plugin for automatic discovery. The existing codebase supports 28+ database types, all following this same pattern.

Usage

Execute this workflow when you need to add support for a new database system (e.g., a proprietary database, a new cloud data warehouse, or a recently released open-source database) to DolphinScheduler's datasource management layer, enabling tasks to read from and write to that database.

Execution Steps

Step 1: Create Plugin Module Structure

Create a new Maven module under the datasource plugin directory following the established naming convention. The module should mirror the directory structure of existing plugins, with source packages organized by functionality (main classes in the root package, parameters in a param sub-package, and tests in the corresponding test directory).

Key considerations:

  • Module name follows the pattern: dolphinscheduler-datasource-{dbname}
  • Package namespace follows: org.apache.dolphinscheduler.plugin.datasource.{dbname}
  • Maven dependencies include the datasource API module and the database JDBC driver

Step 2: Implement DataSourceProcessor

Create the core processor class that implements the DataSourceProcessor interface (or extends AbstractDataSourceProcessor). This class handles parameter validation, JDBC URL construction, connection creation, and SQL parsing for the specific database type. Annotate it with @AutoService for SPI discovery.

Key considerations:

  • Security filtering of dangerous connection properties (e.g., autoDeserialize, allowLoadLocalInfile)
  • Proper JDBC URL format construction specific to the database
  • Password encryption and decryption using PasswordUtils
  • Override checkDatasourceParam for database-specific validation rules

Step 3: Define Connection Parameters and DTO

Create two data classes: a ConnectionParam class holding runtime connection details (JDBC URL, credentials, driver class) and a DataSourceParamDTO class holding the user-facing configuration (host, port, database name, additional properties). These classes bridge the gap between user input and the actual JDBC connection.

Key considerations:

  • ConnectionParam extends BaseConnectionParam with database-specific fields
  • DataSourceParamDTO extends BaseDataSourceParamDTO with appropriate defaults (port, driver class name)
  • JSON serialization compatibility for parameter storage

Step 4: Implement DataSource Channel and Channel Factory

Create the DataSourceChannel implementation that serves as a factory for creating AdHoc and Pooled datasource clients. Also create the DataSourceChannelFactory that registers this channel with the SPI plugin loader by providing the channel name (matching the DbType enum).

Key considerations:

  • Channel creates both AdHocDataSourceClient and PooledDataSourceClient instances
  • ChannelFactory getName() must return the exact string matching the DbType enum value
  • Both AdHoc and Pooled clients extend the corresponding base classes from the API module

Step 5: Create AdHoc and Pooled Client Implementations

Implement two client classes: an AdHocDataSourceClient for one-time query execution and a PooledDataSourceClient for connection-pooled operations. Both clients extend the base classes provided by the API module and configure database-specific connection pooling settings.

Key considerations:

  • AdHoc client is used for connection testing and ad-hoc queries
  • Pooled client manages a HikariCP connection pool for production workloads
  • Some databases may require custom pooled client configuration (e.g., AzureSQL authentication modes)

Step 6: Register Plugin via SPI and Test

Register the plugin by adding the DataSourceChannelFactory to the META-INF/services file for automatic SPI discovery. Write comprehensive unit tests covering parameter validation, JDBC URL generation, connection parameter creation, and the channel factory instantiation to ensure the plugin integrates correctly.

Key considerations:

  • SPI registration file: META-INF/services/org.apache.dolphinscheduler.spi.datasource.DataSourceChannelFactory
  • Test coverage should include boundary conditions for host/port validation
  • Test malicious parameter injection prevention
  • Verify round-trip conversion between DTO and ConnectionParam

Execution Diagram

GitHub URL

Workflow Repository