Principle:Apache Airflow Database Backend Configuration
| Knowledge Sources | |
|---|---|
| Domains | Database, Infrastructure |
| Last Updated | 2026-02-08 00:00 GMT |
Overview
A configuration pattern for setting up Airflow's metadata database and Celery result backend connections in Kubernetes deployments.
Description
Database Backend Configuration covers how Airflow connects to its metadata database (PostgreSQL) and optional Celery result backend (Redis or PostgreSQL) in a Kubernetes environment. The Helm chart supports both embedded database (via Bitnami PostgreSQL subchart) and external database configurations. PgBouncer can be deployed as a sidecar for connection pooling. Database migrations are handled by an init job that runs before component startup.
Usage
Configure database connections when deploying Airflow on Kubernetes. Use the embedded PostgreSQL for development, external managed databases (RDS, Cloud SQL) for production. Enable PgBouncer when connection counts are a concern.
Theoretical Basis
Connection Architecture:
- Metadata DB: Stores all Airflow state (DAGs, runs, tasks, connections, variables)
- Result Backend: Stores Celery task results (only needed with CeleryExecutor)
- Connection Pooling: PgBouncer reduces database connection overhead
Migration Strategy:
- Init container runs airflow db migrate before component startup
- Migrations are idempotent and safe to re-run
- Schema versioned via Alembic revision chain