Workflow:Apache Airflow Scheduler Operation and Task Execution

Knowledge Sources	Apache Airflow Airflow Architecture Scheduler Documentation
Domains	Workflow_Orchestration, Task_Scheduling, Distributed_Systems
Last Updated	2026-02-08 19:00 GMT

Overview

End-to-end process for how the Apache Airflow scheduler discovers DAGs, schedules DAG runs, dispatches tasks to executors, and manages the complete task execution lifecycle through the Task Execution Interface.

Description

This workflow documents the internal operation of the Airflow scheduling and execution pipeline. It covers how the DagFileProcessorManager discovers and parses DAG files, how the SchedulerJob creates DagRun instances based on timetables, how tasks transition through states (scheduled, queued, running, success/failed), how executors (Local, Celery, Kubernetes) dispatch work, and how the Task Execution Interface (TEI) in Airflow 3.x decouples task runtime from the scheduler. It also covers the triggerer component for handling async/deferrable operators.

Usage

Understand this workflow when you need to operate, debug, or extend the Airflow scheduling infrastructure. This is relevant for platform engineers configuring Airflow for production, developers debugging task execution issues, or contributors extending the scheduler, executor, or triggerer components.

Execution Steps

Step 1: DAG File Discovery and Parsing

The DagFileProcessorManager continuously scans the configured dags_folder to discover Python files containing DAG definitions. Each file is processed by a DagFileProcessorProcess worker, which imports the module, extracts DAG objects via the DagBag, and stores serialized DAG structures in the metadata database. The manager coordinates parallel processing, respects .airflowignore exclusions, and tracks file modification times to re-parse changed files.

Key considerations:

The min_file_process_interval configuration controls how frequently files are re-parsed
DAG serialization decouples the webserver from needing direct filesystem access
DagBag validates DAG integrity (no duplicate task IDs, valid cron expressions, no cycles)
DAG versioning tracks structural changes across parses

Step 2: DagRun Creation and Scheduling

The SchedulerJob examines each active DAG's timetable to determine when the next DagRun should be created. For time-based schedules (cron, interval), it calculates data intervals and creates DagRun records with the appropriate execution_date and logical_date. For asset-triggered DAGs, the scheduler watches for upstream asset events and creates runs when trigger conditions are satisfied. Each DagRun transitions through states: queued, running, success, or failed.

Key considerations:

Multiple timetable implementations support diverse scheduling patterns (cron, delta, continuous, event-based, workday-aware)
max_active_runs_per_dag limits concurrent runs of the same DAG
Backfill operations create DagRuns for historical date ranges, either forward or reverse
Asset evaluation uses boolean logic (AND/OR) to combine trigger conditions from multiple upstream assets

Step 3: Task Instance State Management

For each DagRun, the scheduler creates TaskInstance records and manages their state transitions. Tasks start as none, move to scheduled when all upstream dependencies are met, then to queued when sent to an executor. The executor reports back running when execution begins, and finally success or failed upon completion. The scheduler handles retries, upstream failures, and dependency resolution including trigger rules (all_success, one_success, all_done, etc.).

Key considerations:

Task state transitions are persisted in the metadata database for durability
Pool slots limit concurrent execution of tasks sharing constrained resources
Priority weight determines task ordering within the executor queue
Mapped (dynamic) tasks are expanded at scheduling time based on upstream XCom outputs

Step 4: Executor Dispatch

The scheduler dispatches queued tasks to the configured executor. The LocalExecutor runs tasks as subprocesses on the scheduler machine. The CeleryExecutor distributes tasks to worker nodes via a message broker. The KubernetesExecutor launches each task as an individual Kubernetes pod. The ExecutorLoader dynamically loads the appropriate executor class based on configuration.

Key considerations:

The executor interface (BaseExecutor) provides a standard API for task lifecycle management
Worker health is monitored through heartbeat mechanisms
The LocalExecutor manages a pool of worker processes with configurable parallelism
Multiple executor types can be configured simultaneously in Airflow 3.x

Step 5: Task Execution via Task Execution Interface

In Airflow 3.x, tasks execute through the Task Execution Interface (TEI), which decouples the task runtime from the scheduler. The Task SDK provides a lightweight environment for task execution, separate from airflow-core. Tasks communicate with the API server rather than directly accessing the metadata database. This architecture enables multi-language task execution (Python Task SDK, Go Task SDK) and improved isolation.

Key considerations:

The Task SDK is a separate package (apache-airflow-task-sdk) with minimal dependencies
Tasks communicate via the TEI protocol, not direct database access
The Go SDK implements the TEI for running tasks natively in Go
Serialization/deserialization of task parameters is handled by the SDK layer

Step 6: Trigger and Deferrable Operator Handling

The TriggererJob manages asynchronous trigger instances for deferrable operators. When a task defers, it creates a Trigger record and the task enters the deferred state. The triggerer runs trigger classes as async coroutines, monitoring for trigger events. When a trigger fires, the triggerer creates a TriggerEvent, and the scheduler resumes the deferred task from where it left off.

Key considerations:

Deferrable operators free up executor slots while waiting for external conditions
Triggers run as async Python coroutines within the triggerer process
The triggerer supports health monitoring and graceful shutdown
Multiple triggerer instances can run for high availability

Step 7: Callback Execution and Event Notification

Upon task and DAG state transitions, Airflow executes configured callbacks and notifies registered listeners. Callbacks (on_success_callback, on_failure_callback, on_retry_callback) run in the context of the worker. The listener framework (pluggy-based) provides hooks for component lifecycle events, task instance state changes, DAG run transitions, and asset events, enabling observability integrations and custom automation.

Key considerations:

Listeners implement hook specifications defined in the shared listeners package
The ListenerManager coordinates notification delivery to all registered listeners
Slow or throwing listeners are handled gracefully to avoid blocking the scheduling loop
OpenTelemetry traces span the full execution lifecycle from scheduling to completion

Execution Diagram

GitHub URL

Workflow Repository