Workflow:Apache Dolphinscheduler Datasource Connection Management

Knowledge Sources	Apache DolphinScheduler DolphinScheduler Docs
Domains	Data_Engineering, Database_Integration, Connection_Pooling
Last Updated	2026-02-10 10:00 GMT

Overview

End-to-end process for configuring, validating, pooling, and using database connections in Apache DolphinScheduler's multi-datasource environment.

Description

This workflow describes how DolphinScheduler manages database connections across its 28+ supported datasource types. The system provides a unified API for creating, validating, caching, and using connections regardless of the underlying database technology. It covers the lifecycle from user-provided connection parameters through validation, pooling with HikariCP, cached access via Guava cache, and secure credential management. The architecture separates ad-hoc connections (for testing and one-time queries) from pooled connections (for production workloads).

Usage

Execute this workflow when you need to configure a new database connection in DolphinScheduler, troubleshoot connection issues, optimize connection pool settings, or understand how tasks interact with external databases during execution.

Execution Steps

Step 1: Configure Datasource Parameters

The user provides datasource configuration through the API, specifying the database type, host, port, database name, credentials, and any additional connection properties. The system maps this input to a BaseDataSourceParamDTO subclass specific to the chosen database type, applying default values where appropriate (e.g., default ports per database).

Key considerations:

Each database type has a dedicated ParamDTO with type-specific defaults
Additional connection properties are stored as key-value pairs (other field)
Kerberos authentication is supported for Hadoop-ecosystem databases via BaseHDFSDataSourceParamDTO
The API validates required fields before proceeding

Step 2: Validate and Transform Parameters

The DataSourceProcessor for the selected database type validates the parameters against security rules and database-specific constraints. Validation includes checking host format (no illegal characters), database name validity, and filtering dangerous connection properties. Valid parameters are transformed into a ConnectionParam object containing the constructed JDBC URL, encoded credentials, and driver class.

Key considerations:

AbstractDataSourceProcessor enforces a blocklist of dangerous parameter keys
Host and database name are validated against regex patterns
Password is encrypted using PasswordUtils before storage
JDBC URL is constructed using database-specific format rules
A unique datasource ID is generated for caching purposes

Step 3: Test Connection

Before saving the datasource configuration, the system tests the connection by creating an ad-hoc client. The AdHocDataSourceClient establishes a direct JDBC connection using the validated ConnectionParam, executes a basic connectivity check, and immediately closes the connection. This confirms that the credentials and network path are valid.

Key considerations:

Ad-hoc connections bypass the connection pool
Connection test respects configured timeouts
Failed tests return descriptive error messages for troubleshooting
Special handling for non-JDBC datasources (e.g., SSH, K8s, SageMaker)

Step 4: Establish Connection Pool

For production use, the DataSourceClientProvider creates a PooledDataSourceClient backed by HikariCP. The pooled client manages a pool of reusable connections with configurable sizing, idle timeout, and validation settings. The provider uses a Guava cache (24-hour TTL, max 100 entries) to store pooled clients, keyed by the datasource unique ID.

Key considerations:

HikariCP provides high-performance connection pooling
Cache eviction triggers proper cleanup of pooled connections
Multiple tasks sharing the same datasource share the same pool
Pool configuration adapts to database-specific requirements

Step 5: Execute Queries Through Connection

When a task needs to interact with a database, it obtains a connection from the pool through DataSourceClientProvider. The client executes SQL statements, handles result sets, and returns the connection to the pool upon completion. SQL statements are preprocessed to remove comments and split into individual statements for safe execution.

Key considerations:

SQL parsing uses the Druid SQL parser for correctness
Comment removal prevents injection through SQL comments
Statement splitting handles multi-statement execution
Connection is returned to pool even on execution failure (try-with-resources)

Step 6: Monitor and Manage Connection Lifecycle

The system monitors connection health through pool statistics and periodic validation. Stale connections are evicted from the pool, and the Guava cache removes unused datasource clients after the TTL expires. Administrators can update datasource configurations, which triggers pool recreation with the new settings.

Key considerations:

Pool statistics are exposed for monitoring (active connections, idle connections, wait time)
Configuration changes require pool recreation, not hot-reload
Credential rotation is supported by updating the datasource and recreating the pool
Proper shutdown drains all pools and closes all connections

Execution Diagram

GitHub URL

Workflow Repository