Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Principle:Apache Airflow DAG Definition

From Leeroopedia


Knowledge Sources
Domains Workflow_Orchestration, Data_Engineering
Last Updated 2026-02-08 00:00 GMT

Overview

A declarative pattern for defining directed acyclic graphs (DAGs) of tasks that represent data pipeline workflows.

Description

DAG Definition is the foundational concept in Apache Airflow where users declare the structure of their workflows. A DAG is a collection of tasks organized with dependencies that define execution order. Airflow provides two primary mechanisms for DAG definition: the DAG class constructor for explicit instantiation and the @task decorator for a functional TaskFlow API approach. The DAG object encapsulates scheduling configuration, default arguments, callback handlers, and the complete task dependency graph.

Usage

Use this principle whenever creating a new data pipeline, ETL process, or any automated workflow in Airflow. The DAG class is always required as the container for tasks. Choose the @task decorator (TaskFlow API) for Python-native tasks, and traditional operators for integration with external systems.

Theoretical Basis

A DAG (Directed Acyclic Graph) provides:

Graph Properties:

  • Directed: Each edge has a direction (upstream → downstream)
  • Acyclic: No circular dependencies allowed
  • Connected: All tasks must be reachable from the root

Scheduling Model:

  • Each DAG has a timetable that determines when runs are created
  • A data interval defines the period of data each run processes
  • Catchup controls whether missed runs are backfilled

Task Resolution:

# Pseudo-code for DAG task resolution
for task in dag.topological_sort():
    if all(upstream.state == SUCCESS for upstream in task.upstream_tasks):
        schedule(task)

Related Pages

Implemented By

Uses Heuristic

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment