Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Workflow:Astronomer Astronomer cosmos Local dbt DAG rendering

From Leeroopedia


Knowledge Sources
Domains Data_Engineering, dbt, Airflow, Orchestration
Last Updated 2026-02-07 17:00 GMT

Overview

End-to-end process for rendering a dbt project as a native Apache Airflow DAG using Cosmos local execution mode with automatic profile mapping.

Description

This workflow covers the standard procedure for converting a dbt project into an Airflow DAG using the DbtDag class. Cosmos parses the dbt project graph (models, seeds, snapshots, tests) and creates a one-to-one mapping of dbt nodes to Airflow tasks. Each dbt command (run, seed, test, snapshot) is executed locally within the Airflow worker process or as a subprocess, leveraging the same Python environment. The profile configuration is handled through Cosmos profile mappings, which translate Airflow connections into dbt-compatible profiles.yml entries automatically.

The DbtDag class is a thin wrapper that inherits from both Airflow's DAG and Cosmos's DbtToAirflowConverter, so it can be used as a drop-in replacement for a standard Airflow DAG while simultaneously parsing and rendering the dbt project graph.

Usage

Execute this workflow when you have a dbt project accessible on the Airflow worker filesystem and want to render it as a complete, standalone Airflow DAG. This is the simplest and most common integration pattern, suitable when dbt and Airflow share the same Python environment and the worker has sufficient resources to run dbt commands directly. Ideal for development environments, small-to-medium dbt projects, and setups where isolation is not required.

Execution Steps

Step 1: Configure project paths

Define the filesystem paths to the dbt project directory. Cosmos needs access to the dbt project folder on the Airflow controller (for DAG parsing) and on the worker (for task execution). In local execution mode, these are typically the same path. The project must contain a valid dbt_project.yml and a models directory.

Key considerations:

  • The project path must be accessible at both DAG parse time and task runtime
  • Environment variables can be used to make paths configurable across environments
  • The project directory must contain the standard dbt structure (dbt_project.yml, models/, seeds/, etc.)

Step 2: Set up profile configuration

Create a ProfileConfig object that maps an Airflow connection to a dbt profile. Cosmos provides database-specific profile mapping classes (e.g., PostgresUserPasswordProfileMapping, SnowflakeUserPasswordProfileMapping) that extract connection details from an Airflow connection and generate a temporary profiles.yml at runtime. Alternatively, provide a path to an existing profiles.yml file.

Key considerations:

  • Choose between automatic profile mapping (from Airflow connections) or a user-supplied profiles.yml
  • The profile_name and target_name must match what the dbt project expects
  • Profile args like schema can be passed through profile_args
  • Cosmos supports 16+ database types with multiple authentication methods each

Step 3: Configure operator arguments

Define the operator_args dictionary that controls dbt task behavior. Common arguments include install_deps (whether to run dbt deps before each task), full_refresh (force full refresh on incremental models), and execution timeouts. These arguments are passed through to every dbt operator that Cosmos generates.

Key considerations:

  • Set install_deps: True if the project has package dependencies (packages.yml)
  • full_refresh applies only to dbt commands that support the flag (run, build)
  • Additional environment variables can be passed via ProjectConfig.env_vars
  • Callback functions for artifact uploading can be configured here

Step 4: Instantiate the DbtDag

Create a DbtDag instance, passing the ProjectConfig, ProfileConfig, operator_args, and standard Airflow DAG parameters (schedule, start_date, catchup, dag_id). On instantiation, the DbtDag triggers the DbtToAirflowConverter which loads the dbt project graph using the configured load method (defaults to AUTOMATIC, which tries dbt ls first, then falls back to manifest parsing).

Key considerations:

  • The graph loading happens at DAG parse time, not at task runtime
  • LoadMode.AUTOMATIC tries load methods in order: DBT_LS_CACHE, DBT_LS, DBT_MANIFEST, CUSTOM
  • RenderConfig can be used to filter which dbt nodes become Airflow tasks (select/exclude)
  • ExecutionConfig defaults to ExecutionMode.LOCAL when not specified

Step 5: Graph parsing and task generation

Cosmos loads the dbt project graph via DbtGraph.load(), which discovers all models, seeds, tests, snapshots, and sources. It then calls build_airflow_graph() to create corresponding Airflow operator instances (DbtRunLocalOperator, DbtSeedLocalOperator, DbtTestLocalOperator, etc.) and wire them together according to the dbt dependency graph. Each dbt node becomes an Airflow task with proper upstream/downstream relationships preserved.

Key considerations:

  • Test behavior is configurable: run tests after each model, after all models, or not at all
  • Source nodes can optionally be rendered as upstream sensors
  • Dataset emission enables cross-DAG dependencies via Airflow Datasets
  • The resulting DAG structure mirrors the dbt dependency graph

Step 6: Runtime execution

When Airflow triggers the DAG, each task executes its corresponding dbt command (run, seed, test, snapshot) using the local operator. The operator generates a temporary profiles.yml from the profile mapping, sets up environment variables, and invokes dbt either through the dbtRunner API (in-process) or as a subprocess. Results, logs, and status are captured and reported back to Airflow.

Key considerations:

  • InvocationMode.DBT_RUNNER runs dbt in-process (faster, requires dbt in the same environment)
  • InvocationMode.SUBPROCESS spawns a separate process (more isolated)
  • Partial parsing is supported to speed up repeated dbt invocations
  • Caching of dbt ls output, profiles, and partial parse files reduces DAG parse time

Execution Diagram

GitHub URL

Workflow Repository