Heuristic:Astronomer Astronomer cosmos Dbt Invocation Mode Selection
| Knowledge Sources | |
|---|---|
| Domains | Optimization, Data_Orchestration |
| Last Updated | 2026-02-07 17:00 GMT |
Overview
Performance optimization: prefer dbtRunner over subprocess invocation for faster dbt execution, with automatic fallback.
Description
Cosmos supports two modes for invoking dbt commands: dbtRunner (in-process Python API) and subprocess (shell execution). The dbtRunner mode is faster because it avoids the overhead of spawning a new process, parsing environment variables, and re-initializing dbt from scratch. However, dbtRunner requires dbt-core to be importable in the Airflow worker environment. When dbt-core is not available (e.g., when using virtualenv or container execution modes), Cosmos automatically falls back to subprocess invocation. This auto-discovery happens at runtime, not at DAG parse time, so it works correctly even when the execution environment differs from the scheduler.
Usage
Use this heuristic when deciding between InvocationMode.DBT_RUNNER and InvocationMode.SUBPROCESS. If dbt-core is installed alongside Airflow (local execution mode), let Cosmos auto-discover the optimal mode. Only explicitly set InvocationMode.SUBPROCESS when you need isolation or have specific compatibility requirements.
The Insight (Rule of Thumb)
- Action: Leave
invocation_modeunset (let Cosmos auto-discover), or setInvocationMode.DBT_RUNNERwhen dbt-core is available. - Value: dbtRunner is automatically preferred when importable.
- Trade-off: dbtRunner shares the Python process with Airflow, so dbt memory usage adds to the worker footprint. Subprocess provides stronger isolation but incurs startup overhead per task.
- Exception: When using
dbt_runner_callbacks, a new dbtRunner instance is created per invocation (no caching), because callbacks make instances non-reusable.
Reasoning
The dbtRunner is a Python-native API introduced in dbt-core 1.5 that runs dbt commands in-process. This avoids:
- Process spawning overhead (fork + exec)
- Environment variable re-serialization
- dbt project re-parsing from disk
Cosmos caches dbtRunner instances via @functools.lru_cache (or @functools.cache in Python 3.9+), meaning subsequent invocations within the same worker process reuse the already-initialized runner. The cache is disabled during testing (maxsize=0) to prevent test pollution.
Evidence from cosmos/operators/local.py:251-261:
def _discover_invocation_mode(self) -> None:
"""Discovers the invocation mode based on the availability of dbtRunner
for import. If dbtRunner is available, it will be used since it is faster
than subprocess."""
if dbt_runner.is_available():
self.invocation_mode = InvocationMode.DBT_RUNNER
logger.info("dbtRunner is available. Using dbtRunner for invoking dbt.")
else:
self.invocation_mode = InvocationMode.SUBPROCESS
logger.info("Could not import dbtRunner. Falling back to subprocess for invoking dbt.")
Runner caching from cosmos/dbt/runner.py:42-61:
@cache
def _get_cached_dbt_runner() -> dbtRunner:
return dbtRunner()
def get_runner(callbacks: list[Any] | None = None) -> dbtRunner:
if callbacks:
return dbtRunner(callbacks=callbacks)
return _get_cached_dbt_runner()