Principle:Apache Airflow DAG Definition
| Knowledge Sources | |
|---|---|
| Domains | Workflow_Orchestration, Data_Engineering |
| Last Updated | 2026-02-08 00:00 GMT |
Overview
A declarative pattern for defining directed acyclic graphs (DAGs) of tasks that represent data pipeline workflows.
Description
DAG Definition is the foundational concept in Apache Airflow where users declare the structure of their workflows. A DAG is a collection of tasks organized with dependencies that define execution order. Airflow provides two primary mechanisms for DAG definition: the DAG class constructor for explicit instantiation and the @task decorator for a functional TaskFlow API approach. The DAG object encapsulates scheduling configuration, default arguments, callback handlers, and the complete task dependency graph.
Usage
Use this principle whenever creating a new data pipeline, ETL process, or any automated workflow in Airflow. The DAG class is always required as the container for tasks. Choose the @task decorator (TaskFlow API) for Python-native tasks, and traditional operators for integration with external systems.
Theoretical Basis
A DAG (Directed Acyclic Graph) provides:
Graph Properties:
- Directed: Each edge has a direction (upstream → downstream)
- Acyclic: No circular dependencies allowed
- Connected: All tasks must be reachable from the root
Scheduling Model:
- Each DAG has a timetable that determines when runs are created
- A data interval defines the period of data each run processes
- Catchup controls whether missed runs are backfilled
Task Resolution:
# Pseudo-code for DAG task resolution
for task in dag.topological_sort():
if all(upstream.state == SUCCESS for upstream in task.upstream_tasks):
schedule(task)