Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Principle:Astronomer Astronomer cosmos Kubernetes Operator Execution

From Leeroopedia


Knowledge Sources
Domains Data_Engineering, Execution, Kubernetes, Containerization
Last Updated 2026-02-07 00:00 GMT

Overview

An execution principle for running dbt commands in isolated Kubernetes pods, providing complete environment separation from the Airflow scheduler and worker nodes.

Description

Kubernetes execution mode runs each dbt command in a dedicated Kubernetes pod. This provides complete environment isolation -- dbt and all its adapter dependencies exist only in the Docker image, not on Airflow workers. This solves the dependency conflict problem that arises when dbt packages require different versions of shared libraries than Airflow itself.

The architecture requires a dual-path configuration:

  • Local path (DAG parsing time) -- Cosmos needs access to the dbt project files on the Airflow scheduler/worker to run dbt ls or parse manifest.json for graph construction. This path is specified via ProjectConfig.dbt_project_path or by providing a pre-built manifest.
  • Container path (runtime execution) -- The actual dbt execution happens inside the pod, where the project files are baked into the Docker image. This path is specified via project_dir on the operator and typically points to a location inside the container (e.g., /usr/app/dbt/jaffle_shop).

The operator inherits from Airflow's KubernetesPodOperator, which manages the full pod lifecycle: creation, execution, log streaming, and cleanup. Credentials are injected via Kubernetes Secrets rather than Airflow connections at runtime.

Usage

Use Kubernetes execution mode when:

  • Environment isolation is required between Airflow and dbt (different Python versions, conflicting dependencies)
  • KubernetesPodOperator-compatible infrastructure is available (Kubernetes cluster accessible from Airflow)
  • dbt dependencies conflict with Airflow dependencies
  • Resource isolation is needed (dedicated CPU/memory per dbt task via pod resource requests)
  • Security requirements mandate that database credentials are injected via Kubernetes Secrets rather than Airflow connections
from cosmos import ExecutionConfig, ExecutionMode

execution_config = ExecutionConfig(
    execution_mode=ExecutionMode.KUBERNETES,
)

Theoretical Basis

Each dbt node in the rendered Airflow DAG spawns an ephemeral Kubernetes pod. The pod runs the dbt CLI with the appropriate command (run, test, seed, snapshot) and the --select flag targeting the specific node:

# Inside the Kubernetes pod
dbt run --select stg_customers --project-dir /usr/app/dbt/jaffle_shop --profiles-dir /usr/app/dbt/profiles

The execution flow for each task is:

  1. Pod creation -- KubernetesPodOperator creates a pod spec with the configured Docker image, environment variables (converted to V1EnvVar objects), Kubernetes Secrets, resource limits, and volume mounts
  2. Command execution -- The pod entrypoint runs the dbt CLI command
  3. Log streaming -- When get_logs=True, pod stdout/stderr is streamed to the Airflow task log in real-time
  4. Result detection -- Pod exit code determines task success or failure
  5. Warning handling -- For test operators, DbtTestWarningHandler (a KubernetesPodOperatorCallback) scrapes pod logs using regex patterns to detect dbt test warnings and source freshness warnings, then invokes the on_warning_callback
  6. Pod cleanup -- When is_delete_operator_pod=True, the pod is deleted after execution completes

This model provides per-task isolation at the cost of pod startup latency (typically 10-30 seconds per pod). The isolation boundary means that each dbt command runs in a clean environment with no state leakage between tasks.

Related Pages

Implemented By

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment