Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Principle:Astronomer Astronomer cosmos Project Path Configuration

From Leeroopedia


Metadata

Field Value
Page Type Principle
Repository astronomer-cosmos
Domains Data_Engineering, Configuration
Related Implementation Implementation:Astronomer_Astronomer_cosmos_ProjectConfig_Init
Knowledge Sources dbt Project Structure, astronomer-cosmos

Overview

Project Path Configuration is a configuration principle for defining the location, structure, and metadata of a dbt project within an Airflow orchestration context. It establishes the foundational contract between the filesystem layout of a dbt project and the orchestration system that will parse and execute it.

Every dbt-Airflow integration begins with answering a fundamental question: where does the dbt project live, and what does it contain? This principle formalizes the answer by specifying the project root path, relative locations of key subdirectories (models, seeds, snapshots), manifest file location, environment variables, and dbt variables.

Description

A dbt project is organized according to a well-defined directory convention anchored by a dbt_project.yml file at the project root. Within this root, subdirectories such as models/, seeds/, and snapshots/ contain the SQL and YAML files that define the project's transformations and data assets.

When an orchestration system such as Apache Airflow needs to interact with a dbt project, it must be configured with knowledge of this filesystem structure. The Project Path Configuration principle captures this requirement in a library-agnostic manner:

  • Project Root Path: The absolute or relative filesystem path to the directory containing dbt_project.yml. This is the anchor point from which all other paths are resolved.
  • Models Relative Path: The subdirectory within the project root where model SQL files reside. Defaults to models/ by dbt convention.
  • Seeds Relative Path: The subdirectory for seed CSV files. Defaults to seeds/.
  • Snapshots Relative Path: The subdirectory for snapshot definitions. Defaults to snapshots/.
  • Manifest Path: An optional path to a pre-compiled manifest.json file. When provided, the orchestration system can skip dbt parsing and read the project graph directly from the manifest.
  • Environment Variables: Key-value pairs injected into the dbt execution environment, enabling dynamic configuration of profiles, targets, or custom macros.
  • dbt Variables: Key-value pairs passed via the --vars flag to dbt commands, controlling conditional logic within models and macros.
  • Project Name: An explicit project name override, useful when the project name cannot be inferred from the filesystem or when multiple projects coexist.

This principle is library-agnostic: it describes the configuration contract without prescribing a specific implementation. Any tool that integrates dbt with an orchestration system must address these configuration concerns.

Usage

When setting up any dbt-Airflow integration, the first step is always to specify the project location and its structural metadata. This principle applies in the following scenarios:

  • Initial DAG Setup: When creating an Airflow DAG that orchestrates a dbt project, the project path must be configured before any tasks can be generated.
  • Multi-Project Environments: When a single Airflow deployment orchestrates multiple dbt projects, each project requires its own path configuration with distinct root paths and potentially different relative subdirectory layouts.
  • CI/CD Pipelines: In continuous integration workflows, the project path may differ between local development, staging, and production environments. Path configuration enables environment-specific resolution.
  • Manifest-Based Parsing: When a pre-compiled manifest is available (e.g., from a dbt Cloud run or CI artifact), the manifest path configuration enables the orchestrator to skip local parsing entirely, improving DAG generation performance.
  • Custom Project Layouts: Some teams organize dbt projects with non-standard subdirectory names (e.g., transformations/ instead of models/). The relative path configuration accommodates these variations.

Theoretical Basis

dbt projects follow a standard directory convention defined by the dbt_project.yml specification. The key structural elements are:

Element Default Path Purpose
dbt_project.yml Project root Project configuration and metadata
models/ Relative to root SQL model definitions and schema YAML
seeds/ Relative to root CSV seed data files
snapshots/ Relative to root Slowly changing dimension definitions
macros/ Relative to root Jinja macro definitions
manifest.json target/ Compiled project graph (post-parse)

This principle captures the mapping between filesystem structure and orchestration metadata. The orchestrator does not need to understand dbt's internal parsing logic; it only needs to know where to find the project and its key components. This separation of concerns allows the orchestration layer to remain decoupled from dbt's internal implementation details.

The manifest file deserves special attention: it represents a pre-computed project graph that can be used to generate orchestration tasks without invoking dbt's parser. This is particularly valuable in environments where dbt is not installed on the Airflow scheduler, or where parsing performance is a concern.

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment