Workflow:Apache Airflow Kubernetes Deployment via Helm
| Knowledge Sources | |
|---|---|
| Domains | Kubernetes, Infrastructure, Container_Orchestration, DevOps |
| Last Updated | 2026-02-08 19:00 GMT |
Overview
End-to-end process for deploying Apache Airflow on Kubernetes using the official Helm chart, including Docker image preparation, values configuration, and production hardening.
Description
This workflow covers deploying a production-grade Apache Airflow installation on a Kubernetes cluster using the official Helm chart. It encompasses building or customizing Docker images, configuring the comprehensive values.yaml (with its 13,840-line JSON schema for validation), setting up executors (LocalExecutor, CeleryExecutor, or KubernetesExecutor), configuring persistent storage, database backends, and monitoring. The Helm chart manages all Airflow components as Kubernetes resources: scheduler, webserver (API server), triggerer, dag-processor, workers, and supporting infrastructure like Redis and PostgreSQL.
Usage
Execute this workflow when deploying Apache Airflow to a Kubernetes cluster for production or staging environments. This is appropriate when you need scalable, fault-tolerant orchestration with auto-scaling workers, rolling upgrades, and integration with Kubernetes-native tooling. Typical use cases include cloud-native deployments on EKS, GKE, or AKS where Airflow needs to scale dynamically with workload demands.
Execution Steps
Step 1: Docker Image Preparation
Prepare the Airflow Docker image that will run in Kubernetes pods. Either use the official apache/airflow image directly, extend it with additional dependencies (provider packages, Python libraries), or build a fully custom image. The official image supports multiple build arguments for customization and follows OpenShift compatibility guidelines (arbitrary UID with GID=0).
Key considerations:
- The production image uses Debian Bookworm as the base OS
- Images support arbitrary user IDs for OpenShift compatibility
- The entrypoint script handles database connection waiting, migration execution, and user creation
- Use multi-stage builds to minimize image size when adding custom dependencies
- Set umask 0002 when extending the image to maintain proper group write permissions
Step 2: Helm Chart Values Configuration
Configure the Helm chart by customizing values.yaml, which controls all aspects of the deployment. Key configuration areas include executor type, image references, resource requests/limits, database connection strings, authentication, and component scaling. The chart includes a comprehensive JSON schema (values.schema.json) that validates configuration before deployment.
Key considerations:
- The values.schema.json provides 13,840 lines of validation rules for all configuration options
- Choose the appropriate executor: LocalExecutor for simple deployments, CeleryExecutor for distributed workers, KubernetesExecutor for pod-per-task isolation
- Configure airflowHome (default: /opt/airflow) and persistence settings for logs and DAGs
- Security contexts, pod security standards, and network policies should be configured for production
Step 3: Database and Backend Configuration
Configure the metadata database backend (PostgreSQL or MySQL) and message broker (Redis for CeleryExecutor). Set up database connection strings, connection pooling parameters, and migration strategies. The Helm chart can deploy PostgreSQL and Redis as sub-charts or connect to external managed services.
Key considerations:
- PostgreSQL is the recommended production database; SQLite is only for development
- Database migrations run automatically via init containers using airflow db migrate
- Configure the migration wait timeout to account for large schema upgrades
- Fernet key configuration is critical for encrypting connection passwords stored in the database
Step 4: DAG Distribution Strategy
Configure how DAG files are distributed to all Airflow components. Options include git-sync sidecars (continuously pulling from a Git repository), persistent volume claims (shared NFS), or baking DAGs directly into the Docker image. Each approach has trade-offs in deployment speed, consistency, and operational complexity.
Key considerations:
- Git-sync provides continuous deployment of DAG changes without pod restarts
- Persistent volumes require a shared filesystem accessible by all pods
- Embedding DAGs in the Docker image provides the most consistent deployments but requires image rebuilds for DAG changes
- The KubernetesExecutor pod template can specify a separate DAG distribution mechanism for task pods
Step 5: Component Deployment and Scaling
Deploy all Airflow components as Kubernetes resources: Deployment for the scheduler, webserver (API server), triggerer, and dag-processor; StatefulSet or Deployment for Celery workers; and Jobs for database initialization. Configure horizontal pod autoscaling, replica counts, resource requests/limits, and pod disruption budgets for each component.
Key considerations:
- The scheduler should typically run as a single replica (or with HA configuration)
- Workers can scale horizontally based on queue depth (KEDA integration available)
- The triggerer handles async operations and should be sized based on deferrable operator usage
- Log groomer sidecars manage log rotation within pods
- Rolling update strategies ensure zero-downtime upgrades
Step 6: Monitoring and Observability Setup
Configure monitoring and observability for the Airflow deployment. Set up metrics export to Prometheus (via StatsD exporter or OpenTelemetry), configure log aggregation, and enable distributed tracing. The Helm chart supports ServiceMonitor resources for Prometheus Operator integration and can configure health check endpoints for liveness and readiness probes.
Key considerations:
- Airflow supports StatsD, DataDog, and OpenTelemetry metrics backends
- OpenTelemetry tracing provides end-to-end visibility across DAG runs and task executions
- Kubernetes liveness and readiness probes ensure unhealthy pods are automatically restarted
- The Airflow API server can run behind gunicorn with zero-downtime reload capability
Step 7: Production Hardening and Maintenance
Apply production hardening including network policies, secret management (external secrets backends), RBAC configuration, TLS termination, and backup strategies. Establish maintenance procedures for Airflow upgrades (chart version bumps), database migrations, and disaster recovery.
Key considerations:
- Use Kubernetes Secrets or external secrets backends (Vault, AWS Secrets Manager) for sensitive configuration
- The Helm chart supports reproducible builds for audit compliance
- Database backup and restore procedures should be documented and tested
- Chart releases follow SemVer independently of Airflow version releases