Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Principle:DataTalksClub Data engineering zoomcamp Dbt Marts Layer

From Leeroopedia


Page Metadata
Knowledge Sources dbt marts layer best practices, analytics engineering methodology, Kimball dimensional modeling
Domains Analytics Engineering, Dimensional Modeling, Marts Layer Design
Last Updated 2026-02-09 14:00 GMT

Overview

The marts layer is the final transformation layer producing star-schema fact and dimension tables optimized for analytical queries, using incremental materialization for efficient updates and custom macros for business logic.

Description

In a layered transformation architecture, the marts layer (also called the business layer or presentation layer) is the consumer-facing output of the transformation pipeline. It produces tables that:

  • Follow star-schema design: A central fact table (recording business events) is surrounded by dimension tables (providing descriptive context). This design optimizes analytical queries by minimizing the number of joins required.
  • Are materialized as tables: Unlike staging views, marts tables persist their results for fast query performance. Large fact tables may use incremental materialization to avoid full-table rebuilds.
  • Enrich with dimensional context: Fact tables JOIN dimension tables to add human-readable attributes (e.g., zone names, vendor names) alongside foreign key IDs.
  • Apply business logic: Custom macros compute derived metrics (e.g., trip duration in minutes) that encode domain-specific knowledge.
  • Support reporting aggregations: Specialized reporting tables pre-aggregate data by common analytical dimensions (e.g., monthly revenue by zone and service type).

The marts layer is where the analytics engineering team's modeling decisions directly impact business users. Tables in this layer should be stable, well-documented, and optimized for the queries that analysts and dashboards will run.

Usage

Use the marts layer pattern when:

  • Business users need ready-to-query tables for dashboards, reports, or ad-hoc analysis.
  • A star-schema or dimensional model is appropriate for the analytical use case.
  • Large datasets require incremental materialization to avoid expensive full-table rebuilds.
  • Derived business metrics (e.g., trip duration, revenue aggregations) need to be computed.
  • Data contracts (column types, not-null constraints) must be enforced at the final output layer.
  • Multiple reporting views need to be built from the same underlying fact table.

Theoretical Basis

Star Schema Architecture

The star schema places a fact table at the center, joined to dimension tables via foreign keys:

                    +------------------+
                    |   dim_zones      |
                    |  (location_id,   |
                    |   borough, zone, |
                    |   service_zone)  |
                    +--------+---------+
                             |
                             | pickup_location_id
                             | dropoff_location_id
                             |
+------------------+    +----+-------------------+    +------------------+
|  dim_vendors     |    |      fct_trips          |    | fct_monthly_zone |
| (vendor_id,      |--->| (trip_id, vendor_id,    |--->|   _revenue       |
|  vendor_name)    |    |  pickup/dropoff zones,  |    | (zone, month,    |
+------------------+    |  timestamps, metrics,   |    |  revenue, trips) |
                        |  payment details)       |    +------------------+
                        +-------------------------+

This topology enables analysts to slice and filter fact records by any dimension without complex multi-hop joins.

Incremental Materialization

For large fact tables, incremental materialization processes only new or changed records:

function incremental_build(model, target_table):
    if target_table does not exist:
        -- First run: full build
        CREATE TABLE target_table AS (full query)
    else:
        -- Subsequent runs: process only new records
        new_records = query WHERE pickup_datetime > MAX(pickup_datetime) IN target_table
        MERGE new_records INTO target_table ON unique_key = 'trip_id'

The merge strategy ensures that:

  • New records are inserted.
  • Records with the same unique_key are updated (upsert).
  • Existing unchanged records are not touched.

Custom Business Logic via Macros

Macros encapsulate reusable business logic that may require cross-database compatibility:

function get_trip_duration_minutes(pickup, dropoff):
    -- Uses dbt's built-in cross-database datediff
    return datediff(pickup, dropoff, 'minute')
    -- Works on BigQuery, DuckDB, Snowflake, Postgres, Redshift

function get_vendor_data(vendor_id_column):
    -- Generates a CASE statement from a Jinja dictionary at compile time
    vendors = {1: 'Creative Mobile Technologies', 2: 'VeriFone Inc.', 4: 'Unknown/Other'}
    return CASE vendor_id_column
        WHEN 1 THEN 'Creative Mobile Technologies'
        WHEN 2 THEN 'VeriFone Inc.'
        ...
    END

Reporting Aggregation Layer

Reporting models sit on top of fact tables and pre-compute common aggregations, trading storage for query speed:

function build_reporting_aggregation(fact_table, dimensions[], metrics[]):
    return:
        SELECT
            dimensions[],
            SUM(metrics[]) as aggregated_metrics[],
            COUNT(*) as total_records,
            AVG(other_metrics[]) as avg_metrics[]
        FROM fact_table
        GROUP BY dimensions[]

Pre-aggregation is especially valuable when dashboards frequently query the same dimensional breakdowns, avoiding repeated full-table scans.

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment