Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:DataTalksClub Data engineering zoomcamp Dbt Marts Models

From Leeroopedia


Page Metadata
Knowledge Sources repo: DataTalksClub/data-engineering-zoomcamp, dbt docs: dbt marts reference, dbt incremental models
Domains Analytics Engineering, Dimensional Modeling, Marts Models
Last Updated 2026-02-09 14:00 GMT

Overview

Concrete dbt models implementing the marts (business-facing) layer, producing a star-schema with an incremental fact table, two dimension tables, and a monthly revenue reporting aggregation for NYC taxi trip analytics.

Description

The marts layer consists of four models organized into core and reporting sub-directories:

  • fct_trips (26 output columns, incremental): The central fact table joining int_trips with dim_zones to enrich trips with pickup/dropoff borough and zone names. Computes trip_duration_minutes using the get_trip_duration_minutes macro. Uses merge-based incremental strategy with trip_id as unique key.
  • dim_zones (4 output columns): Dimension table built from the taxi_zone_lookup seed, providing location_id-to-zone/borough mapping. Renames locationid to location_id.
  • dim_vendors (2 output columns): Dimension table built from distinct vendor_ids in fct_trips, using the get_vendor_data macro to map vendor IDs to company names via a compiled CASE statement.
  • fct_monthly_zone_revenue (11 output columns): Reporting aggregation that groups fct_trips by pickup zone, month, and service type. Computes revenue breakdowns, trip counts, and averages. Uses cross-database date truncation (BigQuery vs DuckDB syntax).

Usage

These models are the final output of the transformation pipeline, consumed by BI tools, dashboards, and analysts. Run with dbt run --select marts or target individual models.

Code Reference

Source Location

  • 04-analytics-engineering/taxi_rides_ny/models/marts/fct_trips.sql (Lines 1-61)
  • 04-analytics-engineering/taxi_rides_ny/models/marts/dim_zones.sql (Lines 1-10)
  • 04-analytics-engineering/taxi_rides_ny/models/marts/dim_vendors.sql (Lines 1-15)
  • 04-analytics-engineering/taxi_rides_ny/models/marts/reporting/fct_monthly_zone_revenue.sql (Lines 1-29)
  • 04-analytics-engineering/taxi_rides_ny/models/marts/schema.yml (Lines 1-138)
  • 04-analytics-engineering/taxi_rides_ny/models/marts/reporting/schema.yml (Lines 1-35)

Signature: fct_trips (incremental fact table)

{{
  config(
    materialized='incremental',
    unique_key='trip_id',
    incremental_strategy='merge',
    on_schema_change='append_new_columns'
  )
}}

-- Fact table containing all taxi trips enriched with zone information
-- This is a classic star schema design: fact table (trips) joined to dimension table (zones)

select
    -- Trip identifiers
    trips.trip_id,
    trips.vendor_id,
    trips.service_type,
    trips.rate_code_id,

    -- Location details (enriched with human-readable zone names from dimension)
    trips.pickup_location_id,
    pz.borough as pickup_borough,
    pz.zone as pickup_zone,
    trips.dropoff_location_id,
    dz.borough as dropoff_borough,
    dz.zone as dropoff_zone,

    -- Trip timing
    trips.pickup_datetime,
    trips.dropoff_datetime,
    trips.store_and_fwd_flag,

    -- Trip metrics
    trips.passenger_count,
    trips.trip_distance,
    trips.trip_type,
    {{ get_trip_duration_minutes('trips.pickup_datetime', 'trips.dropoff_datetime') }}
        as trip_duration_minutes,

    -- Payment breakdown
    trips.fare_amount,
    trips.extra,
    trips.mta_tax,
    trips.tip_amount,
    trips.tolls_amount,
    trips.ehail_fee,
    trips.improvement_surcharge,
    trips.total_amount,
    trips.payment_type,
    trips.payment_type_description

from {{ ref('int_trips') }} as trips
left join {{ ref('dim_zones') }} as pz
    on trips.pickup_location_id = pz.location_id
left join {{ ref('dim_zones') }} as dz
    on trips.dropoff_location_id = dz.location_id

{% if is_incremental() %}
  -- Only process new trips based on pickup datetime
  where trips.pickup_datetime > (select max(pickup_datetime) from {{ this }})
{% endif %}

Signature: dim_zones

-- Dimension table for NYC taxi zones
select
    locationid as location_id,
    borough,
    zone,
    service_zone
from {{ ref('taxi_zone_lookup') }}

Signature: dim_vendors

-- Dimension table for taxi technology vendors
with trips as (
    select * from {{ ref('fct_trips') }}
),

vendors as (
    select distinct
        vendor_id,
        {{ get_vendor_data('vendor_id') }} as vendor_name
    from trips
)

select * from vendors

Signature: fct_monthly_zone_revenue

-- Data mart for monthly revenue analysis by pickup zone and service type
select
    -- Grouping dimensions
    coalesce(pickup_zone, 'Unknown Zone') as pickup_zone,
    {% if target.type == 'bigquery' %}cast(date_trunc(pickup_datetime, month) as date)
    {% elif target.type == 'duckdb' %}date_trunc('month', pickup_datetime)
    {% endif %} as revenue_month,
    service_type,

    -- Revenue breakdown (summed by zone, month, and service type)
    sum(fare_amount) as revenue_monthly_fare,
    sum(extra) as revenue_monthly_extra,
    sum(mta_tax) as revenue_monthly_mta_tax,
    sum(tip_amount) as revenue_monthly_tip_amount,
    sum(tolls_amount) as revenue_monthly_tolls_amount,
    sum(ehail_fee) as revenue_monthly_ehail_fee,
    sum(improvement_surcharge) as revenue_monthly_improvement_surcharge,
    sum(total_amount) as revenue_monthly_total_amount,

    -- Additional metrics for operational analysis
    count(trip_id) as total_monthly_trips,
    avg(passenger_count) as avg_monthly_passenger_count,
    avg(trip_distance) as avg_monthly_trip_distance

from {{ ref('fct_trips') }}
group by pickup_zone, revenue_month, service_type

Import

Custom macros used by marts models:

-- macros/get_trip_duration_minutes.sql
{% macro get_trip_duration_minutes(pickup_datetime, dropoff_datetime) %}
    {{ dbt.datediff(pickup_datetime, dropoff_datetime, 'minute') }}
{% endmacro %}

-- macros/get_vendor_data.sql
{% macro get_vendor_data(vendor_id_column) %}
{% set vendors = {
    1: 'Creative Mobile Technologies',
    2: 'VeriFone Inc.',
    4: 'Unknown/Other'
} %}
case {{ vendor_id_column }}
    {% for vendor_id, vendor_name in vendors.items() %}
    when {{ vendor_id }} then '{{ vendor_name }}'
    {% endfor %}
end
{% endmacro %}

Materialization defaults from dbt_project.yml (overridden by fct_trips config block):

models:
  taxi_rides_ny:
    marts:
      +materialized: table
# fct_trips overrides to: materialized='incremental'

I/O Contract

Inputs

Input Type Description
Template:Ref('int trips') Intermediate table Deduplicated, enriched trip data (21 columns)
Template:Ref('dim zones') Dimension table Zone/borough lookup (4 columns), self-referencing from seed
Template:Ref('taxi zone lookup') Seed table Raw CSV of 265 NYC taxi zones with locationid, borough, zone, service_zone
Template:Ref('fct trips') Fact table (self-ref) Used by dim_vendors and fct_monthly_zone_revenue
get_trip_duration_minutes macro Jinja macro Cross-database datediff wrapper
get_vendor_data macro Jinja macro Vendor ID to name CASE statement generator

Outputs

Output Type Columns Description
fct_trips Incremental table (26 cols) trip_id, vendor_id, service_type, rate_code_id, pickup_location_id, pickup_borough, pickup_zone, dropoff_location_id, dropoff_borough, dropoff_zone, pickup_datetime, dropoff_datetime, store_and_fwd_flag, passenger_count, trip_distance, trip_type, trip_duration_minutes, fare_amount, extra, mta_tax, tip_amount, tolls_amount, ehail_fee, improvement_surcharge, total_amount, payment_type, payment_type_description Star-schema fact table with zone enrichment and computed duration
dim_zones Table (4 cols) location_id, borough, zone, service_zone Zone dimension from seed data
dim_vendors Table (2 cols) vendor_id, vendor_name Vendor dimension from macro-generated CASE
fct_monthly_zone_revenue Table (11 cols) pickup_zone, revenue_month, service_type, revenue_monthly_fare, revenue_monthly_extra, revenue_monthly_mta_tax, revenue_monthly_tip_amount, revenue_monthly_tolls_amount, revenue_monthly_ehail_fee, revenue_monthly_improvement_surcharge, revenue_monthly_total_amount, total_monthly_trips, avg_monthly_passenger_count, avg_monthly_trip_distance Pre-aggregated monthly revenue reporting table

Usage Examples

Running marts models

# Full refresh of all marts (including incremental models)
dbt run --select marts --full-refresh

# Incremental run (only new records for fct_trips)
dbt run --select marts

# Build a specific model and its upstream dependencies
dbt run --select +fct_trips

# Build only the reporting sub-directory
dbt run --select marts.reporting

Querying the star schema

-- Top 10 pickup zones by total revenue
SELECT
    pickup_zone,
    pickup_borough,
    service_type,
    SUM(total_amount) as total_revenue,
    COUNT(*) as trip_count,
    AVG(trip_duration_minutes) as avg_duration
FROM fct_trips
WHERE pickup_datetime >= '2019-01-01'
GROUP BY pickup_zone, pickup_borough, service_type
ORDER BY total_revenue DESC
LIMIT 10;

Querying the pre-aggregated reporting table

-- Monthly revenue trend by service type
SELECT
    revenue_month,
    service_type,
    revenue_monthly_total_amount,
    total_monthly_trips,
    avg_monthly_trip_distance
FROM fct_monthly_zone_revenue
WHERE pickup_zone = 'JFK Airport'
ORDER BY revenue_month;

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment