Principle:Astronomer Astronomer cosmos Dbt Invocation
| Knowledge Sources | |
|---|---|
| Domains | dbt_Execution, Invocation |
| Last Updated | 2026-02-07 17:00 GMT |
Overview
A dual-mode execution strategy for running data transformation commands and systematically capturing their structured output.
Description
Dbt Invocation encompasses the complete lifecycle of issuing a dbt command, managing the filesystem context it requires, and parsing the results it produces. The principle recognises that there are fundamentally two ways to invoke dbt from within an orchestration process, and it provides a clean abstraction over both.
Subprocess mode delegates execution to an external process through FullOutputSubprocessHook. This hook extends the standard Airflow subprocess mechanism to capture the entire stdout and stderr streams rather than discarding output once a buffer fills. The complete output is retained in memory so that downstream parsing logic can inspect every line. Subprocess mode is the safer default: it isolates dbt's Python runtime from Airflow's, avoiding dependency conflicts, and it mirrors the experience of running dbt from the command line.
Programmatic mode calls dbt's internal Python API directly through a dbtRunner wrapper. In this mode, no child process is created; instead, the dbt library is imported into the same interpreter and its invoke method is called with the desired command and flags. This yields lower latency and richer structured results (Python objects rather than text), but it requires that the dbt-core library and all adapter packages are installed in the Airflow environment.
Regardless of invocation mode, the output passes through a set of specialised parsers:
- Warnings parser -- extracts compiler and runtime warnings so they can be surfaced as Airflow task log annotations.
- Test results parser -- interprets pass, fail, warn, and error outcomes from dbt test and dbt build runs.
- Freshness parser -- reads source freshness check results and determines whether staleness thresholds have been breached.
Before any command can execute, the project filesystem must be prepared. Dbt Project Utils handle this concern through several mechanisms: creating symlinks from the Airflow worker's scratch space to the canonical project directory, copying packages into a location writable by the worker, resolving the manifest file path, and providing Python context managers that set environment variables (such as DBT_PROFILES_DIR) for the duration of the run and clean up afterwards.
Usage
Use subprocess mode when dbt is installed separately from Airflow or when strict process isolation is required. Use programmatic mode when low-latency execution is critical and the dbt library already coexists in the Airflow Python environment. In both cases, rely on the output parsers to convert raw command results into structured data that Airflow tasks can act upon -- for example, failing a task when a freshness check exceeds its error threshold.
Theoretical Basis
The dual-mode design implements the Strategy pattern: a common invocation interface selects between subprocess and programmatic strategies at runtime based on configuration. The filesystem preparation logic applies the Template Method pattern, defining a fixed sequence of steps -- resolve project path, create symlinks, set environment, execute, tear down -- while allowing each step's implementation to vary.
Output parsing follows a Chain of Responsibility approach. Each parser examines the raw output for patterns it understands and extracts structured records, passing unrecognised lines through to the next parser. This makes it straightforward to add new parsers for future dbt output formats without modifying existing ones.