Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Principle:Apache Airflow DAG File Discovery

From Leeroopedia


Knowledge Sources
Domains DAG_Processing, Scheduling
Last Updated 2026-02-08 00:00 GMT

Overview

A continuous file discovery and parsing process that transforms DAG source files into scheduler-consumable representations.

Description

DAG File Discovery is the scheduler-side perspective of how DAG files are found and loaded. Unlike DAG Deployment (which focuses on the dag-processor component writing to the database), this principle covers how the scheduler discovers available DAGs, tracks file modifications, manages parsing parallelism, and handles stale DAG cleanup. The DagFileProcessorManager coordinates this process with configurable parallelism, timeouts, and re-parse intervals.

Usage

This principle applies in the context of scheduler operation. The scheduler relies on the dag-processor to continuously discover and parse DAG files, making them available for scheduling decisions. Understanding this process is essential for troubleshooting DAG visibility issues and parsing delays.

Theoretical Basis

Discovery Loop:

  1. Bundle Scan: Enumerate files in all configured DAG bundles
  2. Filter: Apply safe_mode filtering and .airflowignore rules
  3. Priority Queue: Order files by last parse time and modification status
  4. Parallel Dispatch: Send files to worker processes (bounded by _parallelism)
  5. Result Collection: Gather parsed DAGs and import errors via I/O multiplexing
  6. Stale Cleanup: Remove DAGs not seen within stale_dag_threshold

Related Pages

Implemented By

Uses Heuristic

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment