Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Principle:Astronomer Astronomer cosmos Container Operator Execution

From Leeroopedia


Knowledge Sources
Domains Container_Execution, Cloud
Last Updated 2026-02-07 17:00 GMT

Overview

Running data transformation commands inside fully isolated container runtimes so that the orchestrator environment requires no local installation of the transformation tool or its dependencies.

Description

What it is: Container Operator Execution is a principle in which each dbt command is executed inside an externally managed container -- a Docker container on the local host, an AWS ECS task, an Azure Container Instance, or a GCP Cloud Run Job. The Airflow worker never imports dbt; instead, it builds the dbt command line with the appropriate flags and environment variables, then delegates execution to the relevant Airflow provider operator, which handles container lifecycle, log streaming, and exit-code propagation.

What problem it solves: Local and virtualenv execution modes still require that dbt (and its database adapter, plus any system-level libraries like ODBC drivers) can be installed into the Airflow worker's Python environment or a virtual environment on the same filesystem. In many production deployments this is impractical or forbidden: security policies may prohibit installing arbitrary packages on worker nodes, dbt adapters may require specific OS-level libraries, or the Airflow image may be locked down. Container execution eliminates every host-level dependency by packaging dbt and its full runtime into a pre-built container image, achieving complete environment isolation.

Where it fits: This principle covers four execution modes in Cosmos: DOCKER, AWS_ECS, AZURE_CONTAINER_INSTANCE, and GCP_CLOUD_RUN_JOB. Each is backed by a pair of classes: an abstract base operator that inherits from both AbstractDbtBase (for dbt command construction) and the corresponding Airflow provider operator (for container management), plus a set of concrete subclasses mixing in dbt command mixins (Run, Build, Test, Seed, Snapshot, Source, LS, RunOperation, Clone). The user selects the desired mode at DAG construction time and supplies provider-specific configuration (image name, cluster, region, connection IDs).

Usage

Use Container Operator Execution when:

  • The Airflow worker environment cannot or should not install dbt and its adapter dependencies.
  • You need full operating-system-level isolation, including system libraries, binary drivers, or specific Linux distributions.
  • Your organisation already operates a container orchestration platform (ECS, ACI, Cloud Run, or a Docker host) and wants dbt execution to follow the same operational patterns.
  • You want to version-lock the dbt runtime in a container image independently of the Airflow deployment lifecycle.

Avoid this principle when:

  • The overhead of container startup (image pull, scheduling, network attachment) is unacceptable for the workload's latency requirements.
  • You need tight integration with the Airflow worker's local filesystem (e.g., accessing local dbt artifacts after execution without remote storage).
  • A simpler isolation mechanism (virtualenv) satisfies the dependency requirements.

Theoretical Basis

All four container execution backends share a uniform three-phase pattern:

1. Build the dbt command and environment variables. The build_command method on the base operator calls AbstractDbtBase.build_cmd(), which assembles the full dbt CLI invocation (e.g., dbt run --profiles-dir ... --project-dir ...) along with a dictionary of environment variables encoding connection credentials and runtime configuration. The operator then merges any user-supplied environment variables on top, giving the user the ability to override or extend the runtime environment.

2. Set provider-specific overrides. Each backend translates the generic command and environment into the provider operator's native format:

  • Docker: Sets self.command and self.environment on the underlying DockerOperator.
  • AWS ECS: Constructs a containerOverrides payload with command and environment arrays conforming to the ECS RunTask API schema, placed under self.overrides.
  • Azure Container Instance: Populates self.command and self.environment_variables on the underlying AzureContainerInstancesOperator.
  • GCP Cloud Run Job: Builds a container_overrides structure with args and env arrays matching the Cloud Run Job execution override API, placed under self.overrides.

3. Delegate to the provider operator's execute method. The build_and_run_cmd method calls the provider operator's execute(), which handles container creation, polling, log retrieval, and cleanup according to the provider's semantics. Cosmos does not re-implement any container lifecycle logic; it relies entirely on the battle-tested Airflow provider implementations.

Initialisation segregation. Because AbstractDbtBase was refactored (in PR #1474) to no longer inherit from Airflow's BaseOperator, each container base operator must explicitly initialise both parent hierarchies with the correct subset of keyword arguments. This is accomplished by introspecting each parent class's __init__ signature via inspect.signature and routing kwargs accordingly -- a pattern consistently applied across all four container backends.

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment