Implementation:DataTalksClub Data engineering zoomcamp Dbt Test Docs CLI

Page Metadata
Knowledge Sources	repo: DataTalksClub/data-engineering-zoomcamp, dbt docs: dbt data tests, dbt docs generate
Domains	Analytics Engineering, Data Testing, Documentation Generation, CLI Tools
Last Updated	2026-02-09 14:00 GMT

Overview

Concrete external tool configuration for data quality testing, model contract enforcement, and auto-generated documentation in the dbt analytics transformation pipeline, using dbt test, dbt docs generate, and dbt deps CLI commands.

Description

The taxi_rides_ny project implements a comprehensive testing and documentation strategy across all transformation layers:

Column-level tests: not_null, unique, and accepted_values tests are declared in schema YAML files for key columns across staging, intermediate, and marts models.
Referential integrity tests: relationships tests verify that pickup_location_id and dropoff_location_id in fct_trips reference valid entries in dim_zones.
Composite uniqueness tests: dbt_utils.unique_combination_of_columns validates that the combination of (pickup_zone, revenue_month, service_type) is unique in the reporting table.
Model contract enforcement: fct_trips declares a model contract with enforced: true and explicit data_type for every column, causing builds to fail if the output schema does not match.
Documentation: Every model and column has a description field in its schema YAML, which feeds into the auto-generated documentation site.

Usage

This configuration is used when:

Running dbt test to validate data quality after a dbt run.
Running dbt docs generate followed by dbt docs serve to browse the data catalog.
Running dbt deps to install required packages (dbt_utils, codegen).
Integrating tests into CI/CD pipelines to gate deployments on data quality.

Code Reference

Source Location

04-analytics-engineering/taxi_rides_ny/packages.yml (Lines 1-5)
04-analytics-engineering/taxi_rides_ny/models/staging/schema.yml (Lines 1-95)
04-analytics-engineering/taxi_rides_ny/models/intermediate/schema.yml (Lines 1-107)
04-analytics-engineering/taxi_rides_ny/models/marts/schema.yml (Lines 1-138)
04-analytics-engineering/taxi_rides_ny/models/marts/reporting/schema.yml (Lines 1-35)
04-analytics-engineering/taxi_rides_ny/seeds/seeds_properties.yml (Lines 1-20)

Signature: packages.yml (dependency declarations)

packages:
  - package: dbt-labs/dbt_utils
    version: [">=1.3.0", "<2.0.0"]
  - package: dbt-labs/codegen
    version: [">=0.14.0", "<1.0.0"]

Signature: Model contract on fct_trips (from marts/schema.yml)

models:
  - name: fct_trips
    description: Fact table with all taxi trips including trip and payment details
    config:
      contract:
        enforced: true
    columns:
      - name: trip_id
        description: Unique trip identifier
        data_type: string
        data_tests:
          - unique
          - not_null
      - name: vendor_id
        description: Taxi technology provider
        data_type: integer
        data_tests:
          - not_null
      - name: service_type
        description: Type of taxi service (Green or Yellow)
        data_type: string
        data_tests:
          - accepted_values:
              arguments:
                values: ['Green', 'Yellow']
          - not_null
      - name: pickup_location_id
        description: TLC Taxi Zone where trip started
        data_type: integer
        data_tests:
          - relationships:
              arguments:
                to: ref('dim_zones')
                field: location_id
      - name: dropoff_location_id
        description: TLC Taxi Zone where trip ended
        data_type: integer
        data_tests:
          - relationships:
              arguments:
                to: ref('dim_zones')
                field: location_id
      - name: total_amount
        description: Total amount charged
        data_type: numeric
        data_tests:
          - not_null

Signature: Composite uniqueness test (from reporting/schema.yml)

models:
  - name: fct_monthly_zone_revenue
    description: Monthly revenue aggregation by pickup zone and service type
    data_tests:
      - dbt_utils.unique_combination_of_columns:
          arguments:
            combination_of_columns:
              - pickup_zone
              - revenue_month
              - service_type
    columns:
      - name: pickup_zone
        data_tests:
          - not_null
      - name: revenue_month
        data_tests:
          - not_null
      - name: service_type
        data_tests:
          - not_null
          - accepted_values:
              arguments:
                values: ['Green', 'Yellow']
      - name: revenue_monthly_total_amount
        data_tests:
          - not_null
      - name: total_monthly_trips
        data_tests:
          - not_null

Signature: Staging layer tests (from staging/schema.yml)

models:
  - name: stg_green_tripdata
    description: >
      Staging model for green taxi trip data. This model standardizes column names
      and data types from the raw green_tripdata source.
    columns:
      - name: vendor_id
        description: Taxi technology provider (1 = Creative Mobile Technologies, 2 = VeriFone Inc.)
        data_tests:
          - not_null
      - name: pickup_datetime
        description: Date and time when the meter was engaged
        data_tests:
          - not_null

Signature: Intermediate layer tests (from intermediate/schema.yml)

models:
  - name: int_trips
    description: Cleaned, enriched, and deduplicated trip data ready for marts
    columns:
      - name: trip_id
        description: Unique trip identifier (surrogate key)
        data_tests:
          - unique
          - not_null
      - name: vendor_id
        data_tests:
          - not_null
      - name: service_type
        description: Type of taxi service (Green or Yellow)
        data_tests:
          - not_null
          - accepted_values:
              arguments:
                values: ['Green', 'Yellow']
      - name: total_amount
        data_tests:
          - not_null

Import

# Install required packages (dbt_utils provides generate_surrogate_key and unique_combination_of_columns)
dbt deps

# Output:
# Installing dbt-labs/dbt_utils
# Installing dbt-labs/codegen
# Updated lock file

I/O Contract

Inputs

Input	Type	Description
`packages.yml`	YAML config	Declares dbt_utils (>=1.3.0,<2.0.0) and codegen (>=0.14.0,<1.0.0) dependencies
`schema.yml` files	YAML config	Test and documentation declarations for all models across all layers
`seeds_properties.yml`	YAML config	Test declarations for seed tables (unique, not_null on payment_type)
Built models	Database tables/views	The actual data produced by `dbt run`, against which tests are executed

Outputs

Output	Type	Description
Test results	Pass/Fail/Warn	Results of all data_tests across all schema.yml files
`target/catalog.json`	JSON	Generated catalog metadata for all models, sources, and columns
`target/manifest.json`	JSON	Project manifest including lineage graph, compiled SQL, and test definitions
Documentation site	Static HTML	Browsable data catalog served by `dbt docs serve` on localhost:8080
`dbt_packages/`	Installed packages	Downloaded dbt_utils and codegen packages after `dbt deps`

Usage Examples

Full test suite execution

# Run all tests across all layers
dbt test

# Output example:
# 14:45:01  1 of 22 PASS not_null_stg_green_tripdata_vendor_id ............. [PASS in 0.12s]
# 14:45:01  2 of 22 PASS not_null_stg_green_tripdata_pickup_datetime ....... [PASS in 0.11s]
# 14:45:02  3 of 22 PASS not_null_stg_yellow_tripdata_vendor_id ............ [PASS in 0.10s]
# 14:45:02  4 of 22 PASS unique_int_trips_trip_id .......................... [PASS in 0.23s]
# 14:45:02  5 of 22 PASS not_null_int_trips_trip_id ....................... [PASS in 0.09s]
# 14:45:03  6 of 22 PASS accepted_values_int_trips_service_type ........... [PASS in 0.08s]
# ...
# 14:45:05 22 of 22 PASS unique_combination_fct_monthly_zone_revenue ...... [PASS in 0.31s]
# Finished running 22 data tests in 0 hours 0 minutes and 4.12 seconds

Selective test execution

# Test only the marts layer
dbt test --select marts

# Test a specific model
dbt test --select fct_trips

# Test only relationship tests
dbt test --select test_type:relationships

# Run tests after a specific model build
dbt build --select fct_trips

Documentation generation and serving

# Generate the documentation catalog
dbt docs generate

# Serve the documentation site locally
dbt docs serve --port 8080

# The site includes:
# - Model lineage DAG (visual graph of all model dependencies)
# - Column-level descriptions for every model
# - Test definitions and results
# - Source freshness status
# - Compiled SQL for each model

Complete test inventory across all layers

LAYER          | MODEL                      | COLUMN                | TEST
---------------|----------------------------|-----------------------|-------------------------
staging        | stg_green_tripdata         | vendor_id             | not_null
staging        | stg_green_tripdata         | pickup_datetime       | not_null
staging        | stg_yellow_tripdata        | vendor_id             | not_null
staging        | stg_yellow_tripdata        | pickup_datetime       | not_null
intermediate   | int_trips                  | trip_id               | unique
intermediate   | int_trips                  | trip_id               | not_null
intermediate   | int_trips                  | vendor_id             | not_null
intermediate   | int_trips                  | service_type          | not_null
intermediate   | int_trips                  | service_type          | accepted_values
intermediate   | int_trips                  | pickup_datetime       | not_null
intermediate   | int_trips                  | total_amount          | not_null
marts          | fct_trips                  | trip_id               | unique
marts          | fct_trips                  | trip_id               | not_null
marts          | fct_trips                  | vendor_id             | not_null
marts          | fct_trips                  | service_type          | accepted_values
marts          | fct_trips                  | service_type          | not_null
marts          | fct_trips                  | pickup_location_id    | relationships (dim_zones)
marts          | fct_trips                  | dropoff_location_id   | relationships (dim_zones)
marts          | fct_trips                  | pickup_datetime       | not_null
marts          | fct_trips                  | total_amount          | not_null
marts          | dim_zones                  | location_id           | unique
marts          | dim_zones                  | location_id           | not_null
marts          | dim_vendors                | vendor_id             | unique
marts          | dim_vendors                | vendor_id             | not_null
reporting      | fct_monthly_zone_revenue   | (model-level)         | unique_combination_of_columns
reporting      | fct_monthly_zone_revenue   | pickup_zone           | not_null
reporting      | fct_monthly_zone_revenue   | revenue_month         | not_null
reporting      | fct_monthly_zone_revenue   | service_type          | not_null
reporting      | fct_monthly_zone_revenue   | service_type          | accepted_values
reporting      | fct_monthly_zone_revenue   | revenue_monthly_total | not_null
reporting      | fct_monthly_zone_revenue   | total_monthly_trips   | not_null
seeds          | payment_type_lookup        | payment_type          | unique
seeds          | payment_type_lookup        | payment_type          | not_null

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment