Principle:Astronomer Astronomer cosmos Project Path Configuration
Metadata
| Field | Value |
|---|---|
| Page Type | Principle |
| Repository | astronomer-cosmos |
| Domains | Data_Engineering, Configuration |
| Related Implementation | Implementation:Astronomer_Astronomer_cosmos_ProjectConfig_Init |
| Knowledge Sources | dbt Project Structure, astronomer-cosmos |
Overview
Project Path Configuration is a configuration principle for defining the location, structure, and metadata of a dbt project within an Airflow orchestration context. It establishes the foundational contract between the filesystem layout of a dbt project and the orchestration system that will parse and execute it.
Every dbt-Airflow integration begins with answering a fundamental question: where does the dbt project live, and what does it contain? This principle formalizes the answer by specifying the project root path, relative locations of key subdirectories (models, seeds, snapshots), manifest file location, environment variables, and dbt variables.
Description
A dbt project is organized according to a well-defined directory convention anchored by a dbt_project.yml file at the project root. Within this root, subdirectories such as models/, seeds/, and snapshots/ contain the SQL and YAML files that define the project's transformations and data assets.
When an orchestration system such as Apache Airflow needs to interact with a dbt project, it must be configured with knowledge of this filesystem structure. The Project Path Configuration principle captures this requirement in a library-agnostic manner:
- Project Root Path: The absolute or relative filesystem path to the directory containing
dbt_project.yml. This is the anchor point from which all other paths are resolved. - Models Relative Path: The subdirectory within the project root where model SQL files reside. Defaults to
models/by dbt convention. - Seeds Relative Path: The subdirectory for seed CSV files. Defaults to
seeds/. - Snapshots Relative Path: The subdirectory for snapshot definitions. Defaults to
snapshots/. - Manifest Path: An optional path to a pre-compiled
manifest.jsonfile. When provided, the orchestration system can skip dbt parsing and read the project graph directly from the manifest. - Environment Variables: Key-value pairs injected into the dbt execution environment, enabling dynamic configuration of profiles, targets, or custom macros.
- dbt Variables: Key-value pairs passed via the
--varsflag to dbt commands, controlling conditional logic within models and macros. - Project Name: An explicit project name override, useful when the project name cannot be inferred from the filesystem or when multiple projects coexist.
This principle is library-agnostic: it describes the configuration contract without prescribing a specific implementation. Any tool that integrates dbt with an orchestration system must address these configuration concerns.
Usage
When setting up any dbt-Airflow integration, the first step is always to specify the project location and its structural metadata. This principle applies in the following scenarios:
- Initial DAG Setup: When creating an Airflow DAG that orchestrates a dbt project, the project path must be configured before any tasks can be generated.
- Multi-Project Environments: When a single Airflow deployment orchestrates multiple dbt projects, each project requires its own path configuration with distinct root paths and potentially different relative subdirectory layouts.
- CI/CD Pipelines: In continuous integration workflows, the project path may differ between local development, staging, and production environments. Path configuration enables environment-specific resolution.
- Manifest-Based Parsing: When a pre-compiled manifest is available (e.g., from a dbt Cloud run or CI artifact), the manifest path configuration enables the orchestrator to skip local parsing entirely, improving DAG generation performance.
- Custom Project Layouts: Some teams organize dbt projects with non-standard subdirectory names (e.g.,
transformations/instead ofmodels/). The relative path configuration accommodates these variations.
Theoretical Basis
dbt projects follow a standard directory convention defined by the dbt_project.yml specification. The key structural elements are:
| Element | Default Path | Purpose |
|---|---|---|
dbt_project.yml |
Project root | Project configuration and metadata |
models/ |
Relative to root | SQL model definitions and schema YAML |
seeds/ |
Relative to root | CSV seed data files |
snapshots/ |
Relative to root | Slowly changing dimension definitions |
macros/ |
Relative to root | Jinja macro definitions |
manifest.json |
target/ |
Compiled project graph (post-parse) |
This principle captures the mapping between filesystem structure and orchestration metadata. The orchestrator does not need to understand dbt's internal parsing logic; it only needs to know where to find the project and its key components. This separation of concerns allows the orchestration layer to remain decoupled from dbt's internal implementation details.
The manifest file deserves special attention: it represents a pre-computed project graph that can be used to generate orchestration tasks without invoking dbt's parser. This is particularly valuable in environments where dbt is not installed on the Airflow scheduler, or where parsing performance is a concern.
Related Pages
- Implementation:Astronomer_Astronomer_cosmos_ProjectConfig_Init — The concrete implementation of this principle in the astronomer-cosmos library.
- Principle:Astronomer_Astronomer_cosmos_Profile_Configuration — The complementary principle for configuring database connection profiles.
- Principle:Astronomer_Astronomer_cosmos_Render_Configuration — Configuration for how the project graph is rendered into orchestration tasks.
- Principle:Astronomer_Astronomer_cosmos_Execution_Configuration — Configuration for how dbt commands are executed at runtime.