Principle:Apache Dolphinscheduler Workflow DAG Definition
| Knowledge Sources | |
|---|---|
| Domains | Workflow_Orchestration, Data_Modeling |
| Last Updated | 2026-02-10 00:00 GMT |
Overview
A directed acyclic graph (DAG) data model that represents workflows as a set of task definitions connected by dependency edges, enabling visual orchestration and dependency-based execution ordering.
Description
The Workflow DAG Definition principle defines how DolphinScheduler models workflows as directed acyclic graphs persisted in a relational database. A workflow is composed of three entity types: WorkflowDefinition (the workflow metadata including name, version, project code), TaskDefinition (individual task nodes with type-specific parameters), and WorkflowTaskRelation (edges defining dependencies between tasks). This three-entity model separates the workflow structure from task logic, enabling task reuse across workflows and independent versioning.
The DAG structure ensures that tasks execute in topological order, respecting their dependency relationships. This approach supports complex orchestration patterns including parallel execution, conditional branching, sub-workflows, and task groups.
Usage
Use this principle when defining or modifying workflows through the DolphinScheduler API or UI. Every workflow requires at least one WorkflowDefinition and one or more TaskDefinition entities, connected by WorkflowTaskRelation edges that define the execution order.
Theoretical Basis
The DAG model applies graph theory to workflow orchestration:
- Vertices: TaskDefinition entities represent computation units
- Edges: WorkflowTaskRelation entities define "must run before" dependencies
- Topological Sort: The execution engine processes tasks in topological order
- Versioning: Both workflows and tasks are versioned, enabling rollback and audit
// DAG structure (abstract)
WorkflowDefinition:
code: Long // globally unique identifier
name: String // human-readable name
version: Integer // for versioning/rollback
TaskDefinition:
code: Long // globally unique identifier
name: String // task name
taskType: String // SHELL, SQL, PYTHON, SUB_PROCESS, etc.
taskParams: String // JSON-encoded type-specific parameters
WorkflowTaskRelation:
preTaskCode: Long // upstream task (0 = root)
postTaskCode: Long // downstream task