Implementation:DataTalksClub Data engineering zoomcamp Dbt Project Yml Config
| Page Metadata | |
|---|---|
| Knowledge Sources | repo: DataTalksClub/data-engineering-zoomcamp, dbt docs: dbt_project.yml reference |
| Domains | Analytics Engineering, dbt Configuration, Project Setup |
| Last Updated | 2026-02-09 14:00 GMT |
Overview
Concrete configuration pattern for defining a dbt analytics transformation project using the dbt_project.yml file, establishing project identity, directory layout, materialization defaults, and development variables.
Description
The dbt_project.yml file in the taxi_rides_ny project serves as the single source of truth for how dbt discovers, compiles, and materializes the entire transformation layer. It declares:
- Project identity: The project is named
taxi_rides_nyat version1.0.0, requiring dbt-core versions>=1.7.0and<2.0.0. - Profile binding: The
profilekey binds this project to a connection profile (also namedtaxi_rides_ny) defined in the user'sprofiles.yml. - Directory paths: Models, seeds, macros, analyses, tests, and snapshots each have dedicated directories.
- Materialization hierarchy: Staging models default to
view, while intermediate and marts models default totable. - Project variables: Development date range variables (
dev_start_date,dev_end_date) used by staging models to limit data in dev.
Usage
This configuration pattern is used when:
- Initializing a new dbt project for NYC taxi trip analytics.
- Ensuring all team members run the same dbt version range.
- Applying layer-specific materialization without per-model config blocks.
- Providing default variable values that staging models reference for dev environment filtering.
Code Reference
Source Location
04-analytics-engineering/taxi_rides_ny/dbt_project.yml (Lines 1-37)
Signature
name: 'taxi_rides_ny'
version: '1.0.0'
# Require a specific dbt version for reproducibility
require-dbt-version: [">=1.7.0", "<2.0.0"]
# This setting configures which "profile" dbt uses for this project.
profile: 'taxi_rides_ny'
# These configurations specify where dbt should look for different types of files.
model-paths: ["models"]
analysis-paths: ["analyses"]
test-paths: ["tests"]
seed-paths: ["seeds"]
macro-paths: ["macros"]
snapshot-paths: ["snapshots"]
clean-targets:
- "target"
- "dbt_packages"
# Project-level variables
vars:
# Date range for dev environment sampling
dev_start_date: '2019-01-01'
dev_end_date: '2019-02-01'
# Configuring models
models:
taxi_rides_ny:
staging:
+materialized: view
intermediate:
+materialized: table
marts:
+materialized: table
Import
No import is needed. This file is automatically read by dbt at project root. External dependencies are managed separately through packages.yml:
# packages.yml
packages:
- package: dbt-labs/dbt_utils
version: [">=1.3.0", "<2.0.0"]
- package: dbt-labs/codegen
version: [">=0.14.0", "<1.0.0"]
Install dependencies with:
dbt deps
I/O Contract
Inputs
| Input | Type | Description |
|---|---|---|
profiles.yml |
YAML config file | Connection profile named taxi_rides_ny with adapter-specific credentials (BigQuery or DuckDB)
|
packages.yml |
YAML config file | External package declarations (dbt_utils, codegen) |
models/ directory |
SQL/YAML files | Model definitions discovered by dbt based on model-paths
|
seeds/ directory |
CSV files | Seed data files (payment_type_lookup.csv, taxi_zone_lookup.csv) |
macros/ directory |
SQL/Jinja files | Reusable macro definitions (safe_cast, get_trip_duration_minutes, get_vendor_data) |
Outputs
| Output | Type | Description |
|---|---|---|
| Compiled project graph | DAG | Directed acyclic graph of all models, seeds, and tests with resolved materializations |
| Staging layer (views) | Database views | Models in models/staging/ materialized as views
|
| Intermediate layer (tables) | Database tables | Models in models/intermediate/ materialized as tables
|
| Marts layer (tables) | Database tables | Models in models/marts/ materialized as tables
|
| Variable defaults | Runtime values | dev_start_date='2019-01-01', dev_end_date='2019-02-01' available via var()
|
Usage Examples
Referencing project variables in a staging model
select * from renamed
-- Sample records for dev environment using deterministic date filter
{% if target.name == 'dev' %}
where pickup_datetime >= '{{ var("dev_start_date") }}' and pickup_datetime < '{{ var("dev_end_date") }}'
{% endif %}
Overriding variables from the CLI
# Override the dev date range at runtime
dbt run --vars '{"dev_start_date": "2020-01-01", "dev_end_date": "2020-07-01"}'
Overriding materialization at the model level
-- In a specific model file (e.g., fct_trips.sql):
{{
config(
materialized='incremental',
unique_key='trip_id',
incremental_strategy='merge',
on_schema_change='append_new_columns'
)
}}
Related Pages
- Principle:DataTalksClub_Data_engineering_zoomcamp_Dbt_Project_Configuration
- Implementation:DataTalksClub_Data_engineering_zoomcamp_Dbt_Sources_Yml
- Implementation:DataTalksClub_Data_engineering_zoomcamp_Dbt_Staging_Models
- Implementation:DataTalksClub_Data_engineering_zoomcamp_Dbt_Intermediate_Models
- Implementation:DataTalksClub_Data_engineering_zoomcamp_Dbt_Marts_Models
- Heuristic:DataTalksClub_Data_engineering_zoomcamp_Dbt_Materialization_Strategy
- Environment:DataTalksClub_Data_engineering_zoomcamp_Dbt_DuckDB_Environment