Principle:Astronomer Astronomer cosmos Execution Configuration
Metadata
| Field | Value |
|---|---|
| Page Type | Principle |
| Repository | astronomer-cosmos |
| Domains | Data_Engineering, Configuration, Orchestration |
| Related Implementation | Implementation:Astronomer_Astronomer_cosmos_ExecutionConfig_Init |
| Knowledge Sources | Cosmos Execution Modes, astronomer-cosmos |
Overview
Execution Configuration is a configuration principle for specifying how dbt commands are executed at runtime within an orchestration system. It introduces a critical architectural separation between rendering-time configuration (how the dbt project graph is parsed and converted into tasks) and runtime configuration (where and how dbt commands actually execute).
This separation is essential for modern orchestration architectures where the environment that parses the DAG (e.g., the Airflow scheduler) differs from the environment that runs the tasks (e.g., a Kubernetes pod, Docker container, or isolated virtual environment).
Description
In a traditional dbt workflow, parsing and execution happen in the same environment: a developer runs dbt run from their terminal, and dbt both parses the project and executes the SQL. In an orchestrated environment, however, these two phases often occur in fundamentally different contexts:
- Rendering Phase (scheduler-side): The orchestrator parses the dbt project graph to generate the task DAG. This happens on the Airflow scheduler, which may have limited resources and a different filesystem than the execution environment.
- Execution Phase (worker-side): Individual dbt commands (
dbt run,dbt test) execute in the runtime environment, which may be a local Airflow worker, a Docker container, a Kubernetes pod, or a cloud-managed compute service.
The Execution Configuration principle captures the settings that govern the execution phase:
Execution Mode
The primary decision is where dbt commands run:
- Local: dbt runs directly on the Airflow worker in the same Python process or as a subprocess.
- Virtual Environment: dbt runs in an isolated Python virtual environment on the Airflow worker, enabling version isolation.
- Docker: dbt runs inside a Docker container, providing full environment isolation.
- Kubernetes: dbt runs as a Kubernetes pod, enabling cloud-native scaling and resource management.
- Cloud-Managed: dbt runs on a cloud provider's managed compute service (AWS EKS, AWS ECS, Azure Container Instance, GCP Cloud Run Job).
- Async/Watcher: dbt runs via an asynchronous execution pattern where the Airflow task watches for completion.
Invocation Mode
Within the chosen execution mode, the invocation mode determines how dbt is called:
- dbt Runner: Uses dbt's Python API (
dbt.cli.main.dbtRunner) for in-process execution. - Subprocess: Invokes dbt as an external process via the command line.
Runtime Paths
The execution configuration specifies filesystem paths as they exist in the runtime environment, which may differ from the paths used during rendering. For example:
- During rendering, the dbt project might be at
/usr/local/airflow/dags/dbt/my_project. - During execution in a Docker container, the same project might be mounted at
/dbt/my_project.
Test Indirect Selection
Controls how dbt resolves tests when specific models are selected. The eager strategy selects all tests that reference any selected model, while cautious only selects tests whose all parents are selected.
Usage
Use execution configuration when configuring the runtime execution environment for dbt tasks, especially when the execution environment differs from the DAG parsing environment. Key scenarios include:
- Local Development: Use
LOCALexecution mode for simple setups where dbt is installed on the Airflow worker. - Production Kubernetes: Use
KUBERNETESexecution mode to run each dbt task as an isolated pod with defined resource limits and custom Docker images containing specific dbt versions and adapter packages. - Hybrid Architectures: Use
LOCALrendering withDOCKERorKUBERNETESexecution to parse the project quickly on the scheduler while executing in isolated environments. - Version Isolation: Use
VIRTUALENVexecution mode when multiple dbt projects require different dbt versions on the same Airflow cluster. - Cloud-Native Deployments: Use cloud-managed execution modes (
AWS_EKS,AWS_ECS,AZURE_CONTAINER_INSTANCE,GCP_CLOUD_RUN_JOB) for serverless or managed compute execution. - Async Execution: Use
AIRFLOW_ASYNCorWATCHERmodes for long-running dbt tasks where the Airflow worker should not block waiting for completion.
Theoretical Basis
The render-execute separation enables dual-path architectures where graph parsing happens on the Airflow scheduler (local filesystem) while execution happens in isolated environments (containers). This architectural pattern is grounded in several design principles:
Separation of Concerns
Rendering (what to run) and execution (how to run it) are orthogonal concerns. A project's DAG structure does not depend on where dbt ultimately executes. By separating these configurations, changes to the execution environment (e.g., migrating from local to Kubernetes) do not require changes to the rendering configuration.
Environment Heterogeneity
In production Airflow deployments, the scheduler and workers often run in different environments:
| Phase | Environment | Filesystem | Requirements |
|---|---|---|---|
| Rendering | Airflow scheduler | Scheduler filesystem or mounted volume | dbt project files (or manifest), optionally dbt binary |
| Execution | Worker / Container / Pod | Runtime filesystem (may be different) | dbt binary, project files, profiles.yml, database credentials |
The execution configuration bridges this heterogeneity by specifying runtime-specific paths and settings.
Resource Isolation
Different execution modes provide different levels of resource isolation:
| Execution Mode | Isolation Level | Resource Control | Use Case |
|---|---|---|---|
| LOCAL | None (shared process) | None | Development, simple setups |
| VIRTUALENV | Python environment | Limited | Version isolation |
| DOCKER | Container | CPU, memory limits | Full isolation, local orchestrator |
| KUBERNETES | Pod | CPU, memory, GPU, storage | Cloud-native, auto-scaling |
| WATCHER | Pod (async) | Same as Kubernetes | Long-running tasks |
The progression from LOCAL to KUBERNETES represents increasing isolation and control at the cost of increasing operational complexity. The execution configuration principle enables users to select the appropriate tradeoff for their deployment context.
Related Pages
- Implementation:Astronomer_Astronomer_cosmos_ExecutionConfig_Init — The concrete implementation of this principle in the astronomer-cosmos library.
- Principle:Astronomer_Astronomer_cosmos_Render_Configuration — The complementary principle for controlling how the dbt graph is rendered into tasks.
- Principle:Astronomer_Astronomer_cosmos_Project_Path_Configuration — The principle for specifying project paths, which may differ between render and execution environments.
- Principle:Astronomer_Astronomer_cosmos_Profile_Configuration — The principle for profile configuration, which must be available in the execution environment.