Implementation:DataTalksClub Data engineering zoomcamp Dbt Marts Models
| Page Metadata | |
|---|---|
| Knowledge Sources | repo: DataTalksClub/data-engineering-zoomcamp, dbt docs: dbt marts reference, dbt incremental models |
| Domains | Analytics Engineering, Dimensional Modeling, Marts Models |
| Last Updated | 2026-02-09 14:00 GMT |
Overview
Concrete dbt models implementing the marts (business-facing) layer, producing a star-schema with an incremental fact table, two dimension tables, and a monthly revenue reporting aggregation for NYC taxi trip analytics.
Description
The marts layer consists of four models organized into core and reporting sub-directories:
fct_trips(26 output columns, incremental): The central fact table joiningint_tripswithdim_zonesto enrich trips with pickup/dropoff borough and zone names. Computestrip_duration_minutesusing theget_trip_duration_minutesmacro. Uses merge-based incremental strategy withtrip_idas unique key.
dim_zones(4 output columns): Dimension table built from thetaxi_zone_lookupseed, providing location_id-to-zone/borough mapping. Renameslocationidtolocation_id.
dim_vendors(2 output columns): Dimension table built from distinct vendor_ids infct_trips, using theget_vendor_datamacro to map vendor IDs to company names via a compiled CASE statement.
fct_monthly_zone_revenue(11 output columns): Reporting aggregation that groupsfct_tripsby pickup zone, month, and service type. Computes revenue breakdowns, trip counts, and averages. Uses cross-database date truncation (BigQuery vs DuckDB syntax).
Usage
These models are the final output of the transformation pipeline, consumed by BI tools, dashboards, and analysts. Run with dbt run --select marts or target individual models.
Code Reference
Source Location
04-analytics-engineering/taxi_rides_ny/models/marts/fct_trips.sql(Lines 1-61)04-analytics-engineering/taxi_rides_ny/models/marts/dim_zones.sql(Lines 1-10)04-analytics-engineering/taxi_rides_ny/models/marts/dim_vendors.sql(Lines 1-15)04-analytics-engineering/taxi_rides_ny/models/marts/reporting/fct_monthly_zone_revenue.sql(Lines 1-29)04-analytics-engineering/taxi_rides_ny/models/marts/schema.yml(Lines 1-138)04-analytics-engineering/taxi_rides_ny/models/marts/reporting/schema.yml(Lines 1-35)
Signature: fct_trips (incremental fact table)
{{
config(
materialized='incremental',
unique_key='trip_id',
incremental_strategy='merge',
on_schema_change='append_new_columns'
)
}}
-- Fact table containing all taxi trips enriched with zone information
-- This is a classic star schema design: fact table (trips) joined to dimension table (zones)
select
-- Trip identifiers
trips.trip_id,
trips.vendor_id,
trips.service_type,
trips.rate_code_id,
-- Location details (enriched with human-readable zone names from dimension)
trips.pickup_location_id,
pz.borough as pickup_borough,
pz.zone as pickup_zone,
trips.dropoff_location_id,
dz.borough as dropoff_borough,
dz.zone as dropoff_zone,
-- Trip timing
trips.pickup_datetime,
trips.dropoff_datetime,
trips.store_and_fwd_flag,
-- Trip metrics
trips.passenger_count,
trips.trip_distance,
trips.trip_type,
{{ get_trip_duration_minutes('trips.pickup_datetime', 'trips.dropoff_datetime') }}
as trip_duration_minutes,
-- Payment breakdown
trips.fare_amount,
trips.extra,
trips.mta_tax,
trips.tip_amount,
trips.tolls_amount,
trips.ehail_fee,
trips.improvement_surcharge,
trips.total_amount,
trips.payment_type,
trips.payment_type_description
from {{ ref('int_trips') }} as trips
left join {{ ref('dim_zones') }} as pz
on trips.pickup_location_id = pz.location_id
left join {{ ref('dim_zones') }} as dz
on trips.dropoff_location_id = dz.location_id
{% if is_incremental() %}
-- Only process new trips based on pickup datetime
where trips.pickup_datetime > (select max(pickup_datetime) from {{ this }})
{% endif %}
Signature: dim_zones
-- Dimension table for NYC taxi zones
select
locationid as location_id,
borough,
zone,
service_zone
from {{ ref('taxi_zone_lookup') }}
Signature: dim_vendors
-- Dimension table for taxi technology vendors
with trips as (
select * from {{ ref('fct_trips') }}
),
vendors as (
select distinct
vendor_id,
{{ get_vendor_data('vendor_id') }} as vendor_name
from trips
)
select * from vendors
Signature: fct_monthly_zone_revenue
-- Data mart for monthly revenue analysis by pickup zone and service type
select
-- Grouping dimensions
coalesce(pickup_zone, 'Unknown Zone') as pickup_zone,
{% if target.type == 'bigquery' %}cast(date_trunc(pickup_datetime, month) as date)
{% elif target.type == 'duckdb' %}date_trunc('month', pickup_datetime)
{% endif %} as revenue_month,
service_type,
-- Revenue breakdown (summed by zone, month, and service type)
sum(fare_amount) as revenue_monthly_fare,
sum(extra) as revenue_monthly_extra,
sum(mta_tax) as revenue_monthly_mta_tax,
sum(tip_amount) as revenue_monthly_tip_amount,
sum(tolls_amount) as revenue_monthly_tolls_amount,
sum(ehail_fee) as revenue_monthly_ehail_fee,
sum(improvement_surcharge) as revenue_monthly_improvement_surcharge,
sum(total_amount) as revenue_monthly_total_amount,
-- Additional metrics for operational analysis
count(trip_id) as total_monthly_trips,
avg(passenger_count) as avg_monthly_passenger_count,
avg(trip_distance) as avg_monthly_trip_distance
from {{ ref('fct_trips') }}
group by pickup_zone, revenue_month, service_type
Import
Custom macros used by marts models:
-- macros/get_trip_duration_minutes.sql
{% macro get_trip_duration_minutes(pickup_datetime, dropoff_datetime) %}
{{ dbt.datediff(pickup_datetime, dropoff_datetime, 'minute') }}
{% endmacro %}
-- macros/get_vendor_data.sql
{% macro get_vendor_data(vendor_id_column) %}
{% set vendors = {
1: 'Creative Mobile Technologies',
2: 'VeriFone Inc.',
4: 'Unknown/Other'
} %}
case {{ vendor_id_column }}
{% for vendor_id, vendor_name in vendors.items() %}
when {{ vendor_id }} then '{{ vendor_name }}'
{% endfor %}
end
{% endmacro %}
Materialization defaults from dbt_project.yml (overridden by fct_trips config block):
models:
taxi_rides_ny:
marts:
+materialized: table
# fct_trips overrides to: materialized='incremental'
I/O Contract
Inputs
| Input | Type | Description |
|---|---|---|
Template:Ref('int trips') |
Intermediate table | Deduplicated, enriched trip data (21 columns) |
Template:Ref('dim zones') |
Dimension table | Zone/borough lookup (4 columns), self-referencing from seed |
Template:Ref('taxi zone lookup') |
Seed table | Raw CSV of 265 NYC taxi zones with locationid, borough, zone, service_zone |
Template:Ref('fct trips') |
Fact table (self-ref) | Used by dim_vendors and fct_monthly_zone_revenue |
get_trip_duration_minutes macro |
Jinja macro | Cross-database datediff wrapper |
get_vendor_data macro |
Jinja macro | Vendor ID to name CASE statement generator |
Outputs
| Output | Type | Columns | Description |
|---|---|---|---|
fct_trips |
Incremental table (26 cols) | trip_id, vendor_id, service_type, rate_code_id, pickup_location_id, pickup_borough, pickup_zone, dropoff_location_id, dropoff_borough, dropoff_zone, pickup_datetime, dropoff_datetime, store_and_fwd_flag, passenger_count, trip_distance, trip_type, trip_duration_minutes, fare_amount, extra, mta_tax, tip_amount, tolls_amount, ehail_fee, improvement_surcharge, total_amount, payment_type, payment_type_description | Star-schema fact table with zone enrichment and computed duration |
dim_zones |
Table (4 cols) | location_id, borough, zone, service_zone | Zone dimension from seed data |
dim_vendors |
Table (2 cols) | vendor_id, vendor_name | Vendor dimension from macro-generated CASE |
fct_monthly_zone_revenue |
Table (11 cols) | pickup_zone, revenue_month, service_type, revenue_monthly_fare, revenue_monthly_extra, revenue_monthly_mta_tax, revenue_monthly_tip_amount, revenue_monthly_tolls_amount, revenue_monthly_ehail_fee, revenue_monthly_improvement_surcharge, revenue_monthly_total_amount, total_monthly_trips, avg_monthly_passenger_count, avg_monthly_trip_distance | Pre-aggregated monthly revenue reporting table |
Usage Examples
Running marts models
# Full refresh of all marts (including incremental models)
dbt run --select marts --full-refresh
# Incremental run (only new records for fct_trips)
dbt run --select marts
# Build a specific model and its upstream dependencies
dbt run --select +fct_trips
# Build only the reporting sub-directory
dbt run --select marts.reporting
Querying the star schema
-- Top 10 pickup zones by total revenue
SELECT
pickup_zone,
pickup_borough,
service_type,
SUM(total_amount) as total_revenue,
COUNT(*) as trip_count,
AVG(trip_duration_minutes) as avg_duration
FROM fct_trips
WHERE pickup_datetime >= '2019-01-01'
GROUP BY pickup_zone, pickup_borough, service_type
ORDER BY total_revenue DESC
LIMIT 10;
Querying the pre-aggregated reporting table
-- Monthly revenue trend by service type
SELECT
revenue_month,
service_type,
revenue_monthly_total_amount,
total_monthly_trips,
avg_monthly_trip_distance
FROM fct_monthly_zone_revenue
WHERE pickup_zone = 'JFK Airport'
ORDER BY revenue_month;
Related Pages
- Principle:DataTalksClub_Data_engineering_zoomcamp_Dbt_Marts_Layer
- Environment:DataTalksClub_Data_engineering_zoomcamp_Dbt_DuckDB_Environment
- Implementation:DataTalksClub_Data_engineering_zoomcamp_Dbt_Intermediate_Models
- Implementation:DataTalksClub_Data_engineering_zoomcamp_Dbt_Test_Docs_CLI
- Heuristic:DataTalksClub_Data_engineering_zoomcamp_DuckDB_OOM_Memory_Management
- Heuristic:DataTalksClub_Data_engineering_zoomcamp_Dbt_Materialization_Strategy