Principle:Astronomer Astronomer cosmos Kubernetes Operator Execution
| Knowledge Sources | |
|---|---|
| Domains | Data_Engineering, Execution, Kubernetes, Containerization |
| Last Updated | 2026-02-07 00:00 GMT |
Overview
An execution principle for running dbt commands in isolated Kubernetes pods, providing complete environment separation from the Airflow scheduler and worker nodes.
Description
Kubernetes execution mode runs each dbt command in a dedicated Kubernetes pod. This provides complete environment isolation -- dbt and all its adapter dependencies exist only in the Docker image, not on Airflow workers. This solves the dependency conflict problem that arises when dbt packages require different versions of shared libraries than Airflow itself.
The architecture requires a dual-path configuration:
- Local path (DAG parsing time) -- Cosmos needs access to the dbt project files on the Airflow scheduler/worker to run
dbt lsor parsemanifest.jsonfor graph construction. This path is specified viaProjectConfig.dbt_project_pathor by providing a pre-built manifest. - Container path (runtime execution) -- The actual dbt execution happens inside the pod, where the project files are baked into the Docker image. This path is specified via
project_diron the operator and typically points to a location inside the container (e.g.,/usr/app/dbt/jaffle_shop).
The operator inherits from Airflow's KubernetesPodOperator, which manages the full pod lifecycle: creation, execution, log streaming, and cleanup. Credentials are injected via Kubernetes Secrets rather than Airflow connections at runtime.
Usage
Use Kubernetes execution mode when:
- Environment isolation is required between Airflow and dbt (different Python versions, conflicting dependencies)
- KubernetesPodOperator-compatible infrastructure is available (Kubernetes cluster accessible from Airflow)
- dbt dependencies conflict with Airflow dependencies
- Resource isolation is needed (dedicated CPU/memory per dbt task via pod resource requests)
- Security requirements mandate that database credentials are injected via Kubernetes Secrets rather than Airflow connections
from cosmos import ExecutionConfig, ExecutionMode
execution_config = ExecutionConfig(
execution_mode=ExecutionMode.KUBERNETES,
)
Theoretical Basis
Each dbt node in the rendered Airflow DAG spawns an ephemeral Kubernetes pod. The pod runs the dbt CLI with the appropriate command (run, test, seed, snapshot) and the --select flag targeting the specific node:
# Inside the Kubernetes pod
dbt run --select stg_customers --project-dir /usr/app/dbt/jaffle_shop --profiles-dir /usr/app/dbt/profiles
The execution flow for each task is:
- Pod creation -- KubernetesPodOperator creates a pod spec with the configured Docker image, environment variables (converted to
V1EnvVarobjects), Kubernetes Secrets, resource limits, and volume mounts - Command execution -- The pod entrypoint runs the dbt CLI command
- Log streaming -- When
get_logs=True, pod stdout/stderr is streamed to the Airflow task log in real-time - Result detection -- Pod exit code determines task success or failure
- Warning handling -- For test operators, DbtTestWarningHandler (a
KubernetesPodOperatorCallback) scrapes pod logs using regex patterns to detect dbt test warnings and source freshness warnings, then invokes theon_warning_callback - Pod cleanup -- When
is_delete_operator_pod=True, the pod is deleted after execution completes
This model provides per-task isolation at the cost of pod startup latency (typically 10-30 seconds per pod). The isolation boundary means that each dbt command runs in a clean environment with no state leakage between tasks.