Workflow:Apache Airflow Kubernetes Deployment via Helm

Knowledge Sources	Apache Airflow Helm Chart Documentation Docker Stack Documentation
Domains	Kubernetes, Infrastructure, Container_Orchestration, DevOps
Last Updated	2026-02-08 19:00 GMT

Overview

End-to-end process for deploying Apache Airflow on Kubernetes using the official Helm chart, including Docker image preparation, values configuration, and production hardening.

Description

This workflow covers deploying a production-grade Apache Airflow installation on a Kubernetes cluster using the official Helm chart. It encompasses building or customizing Docker images, configuring the comprehensive values.yaml (with its 13,840-line JSON schema for validation), setting up executors (LocalExecutor, CeleryExecutor, or KubernetesExecutor), configuring persistent storage, database backends, and monitoring. The Helm chart manages all Airflow components as Kubernetes resources: scheduler, webserver (API server), triggerer, dag-processor, workers, and supporting infrastructure like Redis and PostgreSQL.

Usage

Execute this workflow when deploying Apache Airflow to a Kubernetes cluster for production or staging environments. This is appropriate when you need scalable, fault-tolerant orchestration with auto-scaling workers, rolling upgrades, and integration with Kubernetes-native tooling. Typical use cases include cloud-native deployments on EKS, GKE, or AKS where Airflow needs to scale dynamically with workload demands.

Execution Steps

Step 1: Docker Image Preparation

Prepare the Airflow Docker image that will run in Kubernetes pods. Either use the official apache/airflow image directly, extend it with additional dependencies (provider packages, Python libraries), or build a fully custom image. The official image supports multiple build arguments for customization and follows OpenShift compatibility guidelines (arbitrary UID with GID=0).

Key considerations:

The production image uses Debian Bookworm as the base OS
Images support arbitrary user IDs for OpenShift compatibility
The entrypoint script handles database connection waiting, migration execution, and user creation
Use multi-stage builds to minimize image size when adding custom dependencies
Set umask 0002 when extending the image to maintain proper group write permissions

Step 2: Helm Chart Values Configuration

Configure the Helm chart by customizing values.yaml, which controls all aspects of the deployment. Key configuration areas include executor type, image references, resource requests/limits, database connection strings, authentication, and component scaling. The chart includes a comprehensive JSON schema (values.schema.json) that validates configuration before deployment.

Key considerations:

The values.schema.json provides 13,840 lines of validation rules for all configuration options
Choose the appropriate executor: LocalExecutor for simple deployments, CeleryExecutor for distributed workers, KubernetesExecutor for pod-per-task isolation
Configure airflowHome (default: /opt/airflow) and persistence settings for logs and DAGs
Security contexts, pod security standards, and network policies should be configured for production

Step 3: Database and Backend Configuration

Configure the metadata database backend (PostgreSQL or MySQL) and message broker (Redis for CeleryExecutor). Set up database connection strings, connection pooling parameters, and migration strategies. The Helm chart can deploy PostgreSQL and Redis as sub-charts or connect to external managed services.

Key considerations:

PostgreSQL is the recommended production database; SQLite is only for development
Database migrations run automatically via init containers using airflow db migrate
Configure the migration wait timeout to account for large schema upgrades
Fernet key configuration is critical for encrypting connection passwords stored in the database

Step 4: DAG Distribution Strategy

Configure how DAG files are distributed to all Airflow components. Options include git-sync sidecars (continuously pulling from a Git repository), persistent volume claims (shared NFS), or baking DAGs directly into the Docker image. Each approach has trade-offs in deployment speed, consistency, and operational complexity.

Key considerations:

Git-sync provides continuous deployment of DAG changes without pod restarts
Persistent volumes require a shared filesystem accessible by all pods
Embedding DAGs in the Docker image provides the most consistent deployments but requires image rebuilds for DAG changes
The KubernetesExecutor pod template can specify a separate DAG distribution mechanism for task pods

Step 5: Component Deployment and Scaling

Deploy all Airflow components as Kubernetes resources: Deployment for the scheduler, webserver (API server), triggerer, and dag-processor; StatefulSet or Deployment for Celery workers; and Jobs for database initialization. Configure horizontal pod autoscaling, replica counts, resource requests/limits, and pod disruption budgets for each component.

Key considerations:

The scheduler should typically run as a single replica (or with HA configuration)
Workers can scale horizontally based on queue depth (KEDA integration available)
The triggerer handles async operations and should be sized based on deferrable operator usage
Log groomer sidecars manage log rotation within pods
Rolling update strategies ensure zero-downtime upgrades

Step 6: Monitoring and Observability Setup

Configure monitoring and observability for the Airflow deployment. Set up metrics export to Prometheus (via StatsD exporter or OpenTelemetry), configure log aggregation, and enable distributed tracing. The Helm chart supports ServiceMonitor resources for Prometheus Operator integration and can configure health check endpoints for liveness and readiness probes.

Key considerations:

Airflow supports StatsD, DataDog, and OpenTelemetry metrics backends
OpenTelemetry tracing provides end-to-end visibility across DAG runs and task executions
Kubernetes liveness and readiness probes ensure unhealthy pods are automatically restarted
The Airflow API server can run behind gunicorn with zero-downtime reload capability

Step 7: Production Hardening and Maintenance

Apply production hardening including network policies, secret management (external secrets backends), RBAC configuration, TLS termination, and backup strategies. Establish maintenance procedures for Airflow upgrades (chart version bumps), database migrations, and disaster recovery.

Key considerations:

Use Kubernetes Secrets or external secrets backends (Vault, AWS Secrets Manager) for sensitive configuration
The Helm chart supports reproducible builds for audit compliance
Database backup and restore procedures should be documented and tested
Chart releases follow SemVer independently of Airflow version releases

Execution Diagram

GitHub URL

Workflow Repository