Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:DataTalksClub Data engineering zoomcamp Dbt Test Docs CLI

From Leeroopedia


Page Metadata
Knowledge Sources repo: DataTalksClub/data-engineering-zoomcamp, dbt docs: dbt data tests, dbt docs generate
Domains Analytics Engineering, Data Testing, Documentation Generation, CLI Tools
Last Updated 2026-02-09 14:00 GMT

Overview

Concrete external tool configuration for data quality testing, model contract enforcement, and auto-generated documentation in the dbt analytics transformation pipeline, using dbt test, dbt docs generate, and dbt deps CLI commands.

Description

The taxi_rides_ny project implements a comprehensive testing and documentation strategy across all transformation layers:

  • Column-level tests: not_null, unique, and accepted_values tests are declared in schema YAML files for key columns across staging, intermediate, and marts models.
  • Referential integrity tests: relationships tests verify that pickup_location_id and dropoff_location_id in fct_trips reference valid entries in dim_zones.
  • Composite uniqueness tests: dbt_utils.unique_combination_of_columns validates that the combination of (pickup_zone, revenue_month, service_type) is unique in the reporting table.
  • Model contract enforcement: fct_trips declares a model contract with enforced: true and explicit data_type for every column, causing builds to fail if the output schema does not match.
  • Documentation: Every model and column has a description field in its schema YAML, which feeds into the auto-generated documentation site.

Usage

This configuration is used when:

  • Running dbt test to validate data quality after a dbt run.
  • Running dbt docs generate followed by dbt docs serve to browse the data catalog.
  • Running dbt deps to install required packages (dbt_utils, codegen).
  • Integrating tests into CI/CD pipelines to gate deployments on data quality.

Code Reference

Source Location

  • 04-analytics-engineering/taxi_rides_ny/packages.yml (Lines 1-5)
  • 04-analytics-engineering/taxi_rides_ny/models/staging/schema.yml (Lines 1-95)
  • 04-analytics-engineering/taxi_rides_ny/models/intermediate/schema.yml (Lines 1-107)
  • 04-analytics-engineering/taxi_rides_ny/models/marts/schema.yml (Lines 1-138)
  • 04-analytics-engineering/taxi_rides_ny/models/marts/reporting/schema.yml (Lines 1-35)
  • 04-analytics-engineering/taxi_rides_ny/seeds/seeds_properties.yml (Lines 1-20)

Signature: packages.yml (dependency declarations)

packages:
  - package: dbt-labs/dbt_utils
    version: [">=1.3.0", "<2.0.0"]
  - package: dbt-labs/codegen
    version: [">=0.14.0", "<1.0.0"]

Signature: Model contract on fct_trips (from marts/schema.yml)

models:
  - name: fct_trips
    description: Fact table with all taxi trips including trip and payment details
    config:
      contract:
        enforced: true
    columns:
      - name: trip_id
        description: Unique trip identifier
        data_type: string
        data_tests:
          - unique
          - not_null
      - name: vendor_id
        description: Taxi technology provider
        data_type: integer
        data_tests:
          - not_null
      - name: service_type
        description: Type of taxi service (Green or Yellow)
        data_type: string
        data_tests:
          - accepted_values:
              arguments:
                values: ['Green', 'Yellow']
          - not_null
      - name: pickup_location_id
        description: TLC Taxi Zone where trip started
        data_type: integer
        data_tests:
          - relationships:
              arguments:
                to: ref('dim_zones')
                field: location_id
      - name: dropoff_location_id
        description: TLC Taxi Zone where trip ended
        data_type: integer
        data_tests:
          - relationships:
              arguments:
                to: ref('dim_zones')
                field: location_id
      - name: total_amount
        description: Total amount charged
        data_type: numeric
        data_tests:
          - not_null

Signature: Composite uniqueness test (from reporting/schema.yml)

models:
  - name: fct_monthly_zone_revenue
    description: Monthly revenue aggregation by pickup zone and service type
    data_tests:
      - dbt_utils.unique_combination_of_columns:
          arguments:
            combination_of_columns:
              - pickup_zone
              - revenue_month
              - service_type
    columns:
      - name: pickup_zone
        data_tests:
          - not_null
      - name: revenue_month
        data_tests:
          - not_null
      - name: service_type
        data_tests:
          - not_null
          - accepted_values:
              arguments:
                values: ['Green', 'Yellow']
      - name: revenue_monthly_total_amount
        data_tests:
          - not_null
      - name: total_monthly_trips
        data_tests:
          - not_null

Signature: Staging layer tests (from staging/schema.yml)

models:
  - name: stg_green_tripdata
    description: >
      Staging model for green taxi trip data. This model standardizes column names
      and data types from the raw green_tripdata source.
    columns:
      - name: vendor_id
        description: Taxi technology provider (1 = Creative Mobile Technologies, 2 = VeriFone Inc.)
        data_tests:
          - not_null
      - name: pickup_datetime
        description: Date and time when the meter was engaged
        data_tests:
          - not_null

Signature: Intermediate layer tests (from intermediate/schema.yml)

models:
  - name: int_trips
    description: Cleaned, enriched, and deduplicated trip data ready for marts
    columns:
      - name: trip_id
        description: Unique trip identifier (surrogate key)
        data_tests:
          - unique
          - not_null
      - name: vendor_id
        data_tests:
          - not_null
      - name: service_type
        description: Type of taxi service (Green or Yellow)
        data_tests:
          - not_null
          - accepted_values:
              arguments:
                values: ['Green', 'Yellow']
      - name: total_amount
        data_tests:
          - not_null

Import

# Install required packages (dbt_utils provides generate_surrogate_key and unique_combination_of_columns)
dbt deps

# Output:
# Installing dbt-labs/dbt_utils
# Installing dbt-labs/codegen
# Updated lock file

I/O Contract

Inputs

Input Type Description
packages.yml YAML config Declares dbt_utils (>=1.3.0,<2.0.0) and codegen (>=0.14.0,<1.0.0) dependencies
schema.yml files YAML config Test and documentation declarations for all models across all layers
seeds_properties.yml YAML config Test declarations for seed tables (unique, not_null on payment_type)
Built models Database tables/views The actual data produced by dbt run, against which tests are executed

Outputs

Output Type Description
Test results Pass/Fail/Warn Results of all data_tests across all schema.yml files
target/catalog.json JSON Generated catalog metadata for all models, sources, and columns
target/manifest.json JSON Project manifest including lineage graph, compiled SQL, and test definitions
Documentation site Static HTML Browsable data catalog served by dbt docs serve on localhost:8080
dbt_packages/ Installed packages Downloaded dbt_utils and codegen packages after dbt deps

Usage Examples

Full test suite execution

# Run all tests across all layers
dbt test

# Output example:
# 14:45:01  1 of 22 PASS not_null_stg_green_tripdata_vendor_id ............. [PASS in 0.12s]
# 14:45:01  2 of 22 PASS not_null_stg_green_tripdata_pickup_datetime ....... [PASS in 0.11s]
# 14:45:02  3 of 22 PASS not_null_stg_yellow_tripdata_vendor_id ............ [PASS in 0.10s]
# 14:45:02  4 of 22 PASS unique_int_trips_trip_id .......................... [PASS in 0.23s]
# 14:45:02  5 of 22 PASS not_null_int_trips_trip_id ....................... [PASS in 0.09s]
# 14:45:03  6 of 22 PASS accepted_values_int_trips_service_type ........... [PASS in 0.08s]
# ...
# 14:45:05 22 of 22 PASS unique_combination_fct_monthly_zone_revenue ...... [PASS in 0.31s]
# Finished running 22 data tests in 0 hours 0 minutes and 4.12 seconds

Selective test execution

# Test only the marts layer
dbt test --select marts

# Test a specific model
dbt test --select fct_trips

# Test only relationship tests
dbt test --select test_type:relationships

# Run tests after a specific model build
dbt build --select fct_trips

Documentation generation and serving

# Generate the documentation catalog
dbt docs generate

# Serve the documentation site locally
dbt docs serve --port 8080

# The site includes:
# - Model lineage DAG (visual graph of all model dependencies)
# - Column-level descriptions for every model
# - Test definitions and results
# - Source freshness status
# - Compiled SQL for each model

Complete test inventory across all layers

LAYER          | MODEL                      | COLUMN                | TEST
---------------|----------------------------|-----------------------|-------------------------
staging        | stg_green_tripdata         | vendor_id             | not_null
staging        | stg_green_tripdata         | pickup_datetime       | not_null
staging        | stg_yellow_tripdata        | vendor_id             | not_null
staging        | stg_yellow_tripdata        | pickup_datetime       | not_null
intermediate   | int_trips                  | trip_id               | unique
intermediate   | int_trips                  | trip_id               | not_null
intermediate   | int_trips                  | vendor_id             | not_null
intermediate   | int_trips                  | service_type          | not_null
intermediate   | int_trips                  | service_type          | accepted_values
intermediate   | int_trips                  | pickup_datetime       | not_null
intermediate   | int_trips                  | total_amount          | not_null
marts          | fct_trips                  | trip_id               | unique
marts          | fct_trips                  | trip_id               | not_null
marts          | fct_trips                  | vendor_id             | not_null
marts          | fct_trips                  | service_type          | accepted_values
marts          | fct_trips                  | service_type          | not_null
marts          | fct_trips                  | pickup_location_id    | relationships (dim_zones)
marts          | fct_trips                  | dropoff_location_id   | relationships (dim_zones)
marts          | fct_trips                  | pickup_datetime       | not_null
marts          | fct_trips                  | total_amount          | not_null
marts          | dim_zones                  | location_id           | unique
marts          | dim_zones                  | location_id           | not_null
marts          | dim_vendors                | vendor_id             | unique
marts          | dim_vendors                | vendor_id             | not_null
reporting      | fct_monthly_zone_revenue   | (model-level)         | unique_combination_of_columns
reporting      | fct_monthly_zone_revenue   | pickup_zone           | not_null
reporting      | fct_monthly_zone_revenue   | revenue_month         | not_null
reporting      | fct_monthly_zone_revenue   | service_type          | not_null
reporting      | fct_monthly_zone_revenue   | service_type          | accepted_values
reporting      | fct_monthly_zone_revenue   | revenue_monthly_total | not_null
reporting      | fct_monthly_zone_revenue   | total_monthly_trips   | not_null
seeds          | payment_type_lookup        | payment_type          | unique
seeds          | payment_type_lookup        | payment_type          | not_null

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment