Heuristic:Astronomer Astronomer cosmos Dbt Invocation Mode Selection

Knowledge Sources	astronomer-cosmos Cosmos team internal benchmarking
Domains	Optimization, Data_Orchestration
Last Updated	2026-02-07 17:00 GMT

Overview

Performance optimization: prefer dbtRunner over subprocess invocation for faster dbt execution, with automatic fallback.

Description

Cosmos supports two modes for invoking dbt commands: dbtRunner (in-process Python API) and subprocess (shell execution). The dbtRunner mode is faster because it avoids the overhead of spawning a new process, parsing environment variables, and re-initializing dbt from scratch. However, dbtRunner requires dbt-core to be importable in the Airflow worker environment. When dbt-core is not available (e.g., when using virtualenv or container execution modes), Cosmos automatically falls back to subprocess invocation. This auto-discovery happens at runtime, not at DAG parse time, so it works correctly even when the execution environment differs from the scheduler.

Usage

Use this heuristic when deciding between InvocationMode.DBT_RUNNER and InvocationMode.SUBPROCESS. If dbt-core is installed alongside Airflow (local execution mode), let Cosmos auto-discover the optimal mode. Only explicitly set InvocationMode.SUBPROCESS when you need isolation or have specific compatibility requirements.

The Insight (Rule of Thumb)

Action: Leave invocation_mode unset (let Cosmos auto-discover), or set InvocationMode.DBT_RUNNER when dbt-core is available.
Value: dbtRunner is automatically preferred when importable.
Trade-off: dbtRunner shares the Python process with Airflow, so dbt memory usage adds to the worker footprint. Subprocess provides stronger isolation but incurs startup overhead per task.
Exception: When using dbt_runner_callbacks, a new dbtRunner instance is created per invocation (no caching), because callbacks make instances non-reusable.

Reasoning

The dbtRunner is a Python-native API introduced in dbt-core 1.5 that runs dbt commands in-process. This avoids:

Process spawning overhead (fork + exec)
Environment variable re-serialization
dbt project re-parsing from disk

Cosmos caches dbtRunner instances via @functools.lru_cache (or @functools.cache in Python 3.9+), meaning subsequent invocations within the same worker process reuse the already-initialized runner. The cache is disabled during testing (maxsize=0) to prevent test pollution.

Evidence from cosmos/operators/local.py:251-261:

def _discover_invocation_mode(self) -> None:
    """Discovers the invocation mode based on the availability of dbtRunner
    for import. If dbtRunner is available, it will be used since it is faster
    than subprocess."""
    if dbt_runner.is_available():
        self.invocation_mode = InvocationMode.DBT_RUNNER
        logger.info("dbtRunner is available. Using dbtRunner for invoking dbt.")
    else:
        self.invocation_mode = InvocationMode.SUBPROCESS
        logger.info("Could not import dbtRunner. Falling back to subprocess for invoking dbt.")

Runner caching from cosmos/dbt/runner.py:42-61:

@cache
def _get_cached_dbt_runner() -> dbtRunner:
    return dbtRunner()

def get_runner(callbacks: list[Any] | None = None) -> dbtRunner:
    if callbacks:
        return dbtRunner(callbacks=callbacks)
    return _get_cached_dbt_runner()

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment