Principle:Apache Airflow DAG Distribution Strategy
| Knowledge Sources | |
|---|---|
| Domains | Deployment, Kubernetes |
| Last Updated | 2026-02-08 00:00 GMT |
Overview
A strategy pattern for distributing DAG files to Airflow components running in Kubernetes pods.
Description
DAG Distribution Strategy addresses how DAG Python files are made available to all Airflow pods (scheduler, workers, dag-processor) in a Kubernetes cluster. Three primary strategies exist: git-sync (sidecar container that syncs from a Git repository), PVC (shared Persistent Volume Claim), and baked-in (DAGs embedded in the Docker image). Each strategy has different trade-offs for update speed, reliability, and operational complexity.
Usage
Choose git-sync for dynamic DAG updates from version control. Use PVC when DAGs are managed outside Git. Use baked-in images for air-gapped environments or when DAG immutability is required.
Theoretical Basis
Strategy Comparison:
| Strategy | Update Speed | Complexity | Best For |
|---|---|---|---|
| Git-sync | Fast (seconds) | Medium | CI/CD workflows |
| PVC | Medium | Low | Shared filesystem setups |
| Baked-in | Slow (rebuild) | Low | Air-gapped, immutable |
Git-sync Architecture:
- Sidecar container in each pod polls Git repository
- DAGs available at shared volume mount (/opt/airflow/dags)
- Supports authentication via SSH keys or token secrets