Workflow:Astronomer Astronomer cosmos Kubernetes dbt execution
| Knowledge Sources | |
|---|---|
| Domains | Data_Engineering, dbt, Airflow, Kubernetes, Orchestration |
| Last Updated | 2026-02-07 17:00 GMT |
Overview
End-to-end process for running dbt models in isolated Kubernetes pods using Cosmos Kubernetes execution mode, providing resource isolation and environment independence from the Airflow worker.
Description
This workflow covers the procedure for executing dbt commands inside Kubernetes pods using ExecutionMode.KUBERNETES. Each dbt node (model, seed, test, snapshot) runs as a separate KubernetesPodOperator-based task, launching a containerized dbt environment. This provides complete isolation between dbt and Airflow environments, allowing different dbt versions, dependencies, and resource allocations per task. The dbt project and its dependencies are packaged into a Docker image, while database credentials are managed through Kubernetes Secrets.
The graph parsing still happens on the Airflow controller using the local dbt project files (via RenderConfig.dbt_project_path), while execution happens in Kubernetes pods using the containerized project path (via ExecutionConfig.dbt_project_path). This dual-path design separates the parsing environment from the execution environment.
Usage
Execute this workflow when dbt and Airflow have conflicting Python dependencies, when dbt tasks need more resources (CPU/memory) than the Airflow worker provides, when you need strict environment isolation between dbt versions, or when running in a Kubernetes-native environment like Astronomer or GKE. This mode is suitable for production deployments where stability and resource control are critical.
Execution Steps
Step 1: Build the dbt Docker image
Create a Docker image containing the dbt project files, dbt executable, and all required database adapters. The image should include the dbt project at a known path (e.g., dags/dbt/jaffle_shop) and have dbt properly installed with the necessary adapter packages. A profiles.yml can be embedded in the image or generated at runtime from environment variables.
Key considerations:
- The Docker image must include both dbt-core and the appropriate database adapter
- Project files in the image should match the structure expected by the dbt commands
- Use multi-stage builds to keep the image size manageable
- Tag images with version numbers for reproducibility
Step 2: Configure Kubernetes Secrets
Define Kubernetes Secret objects for sensitive configuration values such as database passwords and host addresses. Secrets are injected into the pod as environment variables at runtime, keeping credentials out of DAG code and Docker images. Each secret maps a Kubernetes secret key to an environment variable name.
Key considerations:
- Secrets must exist in the Kubernetes namespace where pods will run
- Use deploy_type="env" to inject secrets as environment variables
- The deploy_target name must match what the dbt profiles.yml expects
- Multiple secrets can be combined for different credential components
Step 3: Configure dual project paths
Set up the RenderConfig with the local Airflow controller path to the dbt project (for DAG parsing) and the ExecutionConfig with the container-internal path (for runtime execution). The render path is used by Cosmos to discover dbt nodes at parse time, while the execution path tells the Kubernetes pod where to find the project files.
Key considerations:
- RenderConfig.dbt_project_path points to the local filesystem path accessible during DAG parsing
- ExecutionConfig.dbt_project_path points to the path inside the Docker container
- ExecutionConfig.execution_mode must be set to ExecutionMode.KUBERNETES
- The project structure must be consistent between both paths
Step 4: Configure profile for Kubernetes context
Create a ProfileConfig that works in both contexts. For DAG parsing, a profile mapping (e.g., PostgresUserPasswordProfileMapping) resolves the Airflow connection. For Kubernetes execution, the pod uses the profiles.yml baked into the Docker image or one generated from environment variables injected via Secrets.
Key considerations:
- Profile mapping is used for DAG parsing but may not be exposed inside the K8s pod
- The pod relies on environment variables (from Secrets) and the baked-in profiles.yml
- The profile_name and target_name must match in both contexts
- Database connection details flow from Kubernetes Secrets to dbt via environment variables
Step 5: Seed data loading
Use DbtSeedKubernetesOperator to load seed data into the database before running models. This operator runs dbt seed inside a Kubernetes pod with the same image and secrets. Seed loading is typically a separate upstream task that runs before the main model TaskGroup.
Key considerations:
- Seeds must be loaded before dependent models run
- The seed operator uses the same Docker image and Kubernetes secrets as the model operators
- is_delete_operator_pod controls whether pods are cleaned up after execution
- get_logs=True streams pod logs back to the Airflow task log
Step 6: Run models via DbtTaskGroup
Create a DbtTaskGroup with ExecutionMode.KUBERNETES to run all dbt models. Each model becomes a separate Kubernetes pod. Pass operator_args including the Docker image name, secrets, environment variables, and pod lifecycle settings. Cosmos maps each dbt node to a DbtRunKubernetesOperator, DbtTestKubernetesOperator, etc.
Key considerations:
- Each dbt node spawns a separate Kubernetes pod
- Pod resource requests and limits can be set via operator_args
- is_delete_operator_pod=False keeps pods around for debugging
- Dependencies between models are preserved as Airflow task dependencies
Step 7: Wire seed and model tasks
Establish the dependency chain with seed loading running first, followed by the model TaskGroup. This ensures that seed data is available in the database before models that depend on it are executed.
Key considerations:
- Use the >> operator to set load_seeds >> run_models
- Additional pre-processing or post-processing tasks can be added to the chain
- The TaskGroup encapsulates all model-level dependencies internally