Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Principle:Astronomer Astronomer cosmos Execution Configuration

From Leeroopedia


Metadata

Field Value
Page Type Principle
Repository astronomer-cosmos
Domains Data_Engineering, Configuration, Orchestration
Related Implementation Implementation:Astronomer_Astronomer_cosmos_ExecutionConfig_Init
Knowledge Sources Cosmos Execution Modes, astronomer-cosmos

Overview

Execution Configuration is a configuration principle for specifying how dbt commands are executed at runtime within an orchestration system. It introduces a critical architectural separation between rendering-time configuration (how the dbt project graph is parsed and converted into tasks) and runtime configuration (where and how dbt commands actually execute).

This separation is essential for modern orchestration architectures where the environment that parses the DAG (e.g., the Airflow scheduler) differs from the environment that runs the tasks (e.g., a Kubernetes pod, Docker container, or isolated virtual environment).

Description

In a traditional dbt workflow, parsing and execution happen in the same environment: a developer runs dbt run from their terminal, and dbt both parses the project and executes the SQL. In an orchestrated environment, however, these two phases often occur in fundamentally different contexts:

  • Rendering Phase (scheduler-side): The orchestrator parses the dbt project graph to generate the task DAG. This happens on the Airflow scheduler, which may have limited resources and a different filesystem than the execution environment.
  • Execution Phase (worker-side): Individual dbt commands (dbt run, dbt test) execute in the runtime environment, which may be a local Airflow worker, a Docker container, a Kubernetes pod, or a cloud-managed compute service.

The Execution Configuration principle captures the settings that govern the execution phase:

Execution Mode

The primary decision is where dbt commands run:

  • Local: dbt runs directly on the Airflow worker in the same Python process or as a subprocess.
  • Virtual Environment: dbt runs in an isolated Python virtual environment on the Airflow worker, enabling version isolation.
  • Docker: dbt runs inside a Docker container, providing full environment isolation.
  • Kubernetes: dbt runs as a Kubernetes pod, enabling cloud-native scaling and resource management.
  • Cloud-Managed: dbt runs on a cloud provider's managed compute service (AWS EKS, AWS ECS, Azure Container Instance, GCP Cloud Run Job).
  • Async/Watcher: dbt runs via an asynchronous execution pattern where the Airflow task watches for completion.

Invocation Mode

Within the chosen execution mode, the invocation mode determines how dbt is called:

  • dbt Runner: Uses dbt's Python API (dbt.cli.main.dbtRunner) for in-process execution.
  • Subprocess: Invokes dbt as an external process via the command line.

Runtime Paths

The execution configuration specifies filesystem paths as they exist in the runtime environment, which may differ from the paths used during rendering. For example:

  • During rendering, the dbt project might be at /usr/local/airflow/dags/dbt/my_project.
  • During execution in a Docker container, the same project might be mounted at /dbt/my_project.

Test Indirect Selection

Controls how dbt resolves tests when specific models are selected. The eager strategy selects all tests that reference any selected model, while cautious only selects tests whose all parents are selected.

Usage

Use execution configuration when configuring the runtime execution environment for dbt tasks, especially when the execution environment differs from the DAG parsing environment. Key scenarios include:

  • Local Development: Use LOCAL execution mode for simple setups where dbt is installed on the Airflow worker.
  • Production Kubernetes: Use KUBERNETES execution mode to run each dbt task as an isolated pod with defined resource limits and custom Docker images containing specific dbt versions and adapter packages.
  • Hybrid Architectures: Use LOCAL rendering with DOCKER or KUBERNETES execution to parse the project quickly on the scheduler while executing in isolated environments.
  • Version Isolation: Use VIRTUALENV execution mode when multiple dbt projects require different dbt versions on the same Airflow cluster.
  • Cloud-Native Deployments: Use cloud-managed execution modes (AWS_EKS, AWS_ECS, AZURE_CONTAINER_INSTANCE, GCP_CLOUD_RUN_JOB) for serverless or managed compute execution.
  • Async Execution: Use AIRFLOW_ASYNC or WATCHER modes for long-running dbt tasks where the Airflow worker should not block waiting for completion.

Theoretical Basis

The render-execute separation enables dual-path architectures where graph parsing happens on the Airflow scheduler (local filesystem) while execution happens in isolated environments (containers). This architectural pattern is grounded in several design principles:

Separation of Concerns

Rendering (what to run) and execution (how to run it) are orthogonal concerns. A project's DAG structure does not depend on where dbt ultimately executes. By separating these configurations, changes to the execution environment (e.g., migrating from local to Kubernetes) do not require changes to the rendering configuration.

Environment Heterogeneity

In production Airflow deployments, the scheduler and workers often run in different environments:

Phase Environment Filesystem Requirements
Rendering Airflow scheduler Scheduler filesystem or mounted volume dbt project files (or manifest), optionally dbt binary
Execution Worker / Container / Pod Runtime filesystem (may be different) dbt binary, project files, profiles.yml, database credentials

The execution configuration bridges this heterogeneity by specifying runtime-specific paths and settings.

Resource Isolation

Different execution modes provide different levels of resource isolation:

Execution Mode Isolation Level Resource Control Use Case
LOCAL None (shared process) None Development, simple setups
VIRTUALENV Python environment Limited Version isolation
DOCKER Container CPU, memory limits Full isolation, local orchestrator
KUBERNETES Pod CPU, memory, GPU, storage Cloud-native, auto-scaling
WATCHER Pod (async) Same as Kubernetes Long-running tasks

The progression from LOCAL to KUBERNETES represents increasing isolation and control at the cost of increasing operational complexity. The execution configuration principle enables users to select the appropriate tradeoff for their deployment context.

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment