Implementation:DataTalksClub Data engineering zoomcamp Dbt Test Docs CLI
Appearance
| Page Metadata | |
|---|---|
| Knowledge Sources | repo: DataTalksClub/data-engineering-zoomcamp, dbt docs: dbt data tests, dbt docs generate |
| Domains | Analytics Engineering, Data Testing, Documentation Generation, CLI Tools |
| Last Updated | 2026-02-09 14:00 GMT |
Overview
Concrete external tool configuration for data quality testing, model contract enforcement, and auto-generated documentation in the dbt analytics transformation pipeline, using dbt test, dbt docs generate, and dbt deps CLI commands.
Description
The taxi_rides_ny project implements a comprehensive testing and documentation strategy across all transformation layers:
- Column-level tests:
not_null,unique, andaccepted_valuestests are declared in schema YAML files for key columns across staging, intermediate, and marts models. - Referential integrity tests:
relationshipstests verify thatpickup_location_idanddropoff_location_idinfct_tripsreference valid entries indim_zones. - Composite uniqueness tests:
dbt_utils.unique_combination_of_columnsvalidates that the combination of(pickup_zone, revenue_month, service_type)is unique in the reporting table. - Model contract enforcement:
fct_tripsdeclares a model contract withenforced: trueand explicitdata_typefor every column, causing builds to fail if the output schema does not match. - Documentation: Every model and column has a
descriptionfield in its schema YAML, which feeds into the auto-generated documentation site.
Usage
This configuration is used when:
- Running
dbt testto validate data quality after adbt run. - Running
dbt docs generatefollowed bydbt docs serveto browse the data catalog. - Running
dbt depsto install required packages (dbt_utils, codegen). - Integrating tests into CI/CD pipelines to gate deployments on data quality.
Code Reference
Source Location
04-analytics-engineering/taxi_rides_ny/packages.yml(Lines 1-5)04-analytics-engineering/taxi_rides_ny/models/staging/schema.yml(Lines 1-95)04-analytics-engineering/taxi_rides_ny/models/intermediate/schema.yml(Lines 1-107)04-analytics-engineering/taxi_rides_ny/models/marts/schema.yml(Lines 1-138)04-analytics-engineering/taxi_rides_ny/models/marts/reporting/schema.yml(Lines 1-35)04-analytics-engineering/taxi_rides_ny/seeds/seeds_properties.yml(Lines 1-20)
Signature: packages.yml (dependency declarations)
packages:
- package: dbt-labs/dbt_utils
version: [">=1.3.0", "<2.0.0"]
- package: dbt-labs/codegen
version: [">=0.14.0", "<1.0.0"]
Signature: Model contract on fct_trips (from marts/schema.yml)
models:
- name: fct_trips
description: Fact table with all taxi trips including trip and payment details
config:
contract:
enforced: true
columns:
- name: trip_id
description: Unique trip identifier
data_type: string
data_tests:
- unique
- not_null
- name: vendor_id
description: Taxi technology provider
data_type: integer
data_tests:
- not_null
- name: service_type
description: Type of taxi service (Green or Yellow)
data_type: string
data_tests:
- accepted_values:
arguments:
values: ['Green', 'Yellow']
- not_null
- name: pickup_location_id
description: TLC Taxi Zone where trip started
data_type: integer
data_tests:
- relationships:
arguments:
to: ref('dim_zones')
field: location_id
- name: dropoff_location_id
description: TLC Taxi Zone where trip ended
data_type: integer
data_tests:
- relationships:
arguments:
to: ref('dim_zones')
field: location_id
- name: total_amount
description: Total amount charged
data_type: numeric
data_tests:
- not_null
Signature: Composite uniqueness test (from reporting/schema.yml)
models:
- name: fct_monthly_zone_revenue
description: Monthly revenue aggregation by pickup zone and service type
data_tests:
- dbt_utils.unique_combination_of_columns:
arguments:
combination_of_columns:
- pickup_zone
- revenue_month
- service_type
columns:
- name: pickup_zone
data_tests:
- not_null
- name: revenue_month
data_tests:
- not_null
- name: service_type
data_tests:
- not_null
- accepted_values:
arguments:
values: ['Green', 'Yellow']
- name: revenue_monthly_total_amount
data_tests:
- not_null
- name: total_monthly_trips
data_tests:
- not_null
Signature: Staging layer tests (from staging/schema.yml)
models:
- name: stg_green_tripdata
description: >
Staging model for green taxi trip data. This model standardizes column names
and data types from the raw green_tripdata source.
columns:
- name: vendor_id
description: Taxi technology provider (1 = Creative Mobile Technologies, 2 = VeriFone Inc.)
data_tests:
- not_null
- name: pickup_datetime
description: Date and time when the meter was engaged
data_tests:
- not_null
Signature: Intermediate layer tests (from intermediate/schema.yml)
models:
- name: int_trips
description: Cleaned, enriched, and deduplicated trip data ready for marts
columns:
- name: trip_id
description: Unique trip identifier (surrogate key)
data_tests:
- unique
- not_null
- name: vendor_id
data_tests:
- not_null
- name: service_type
description: Type of taxi service (Green or Yellow)
data_tests:
- not_null
- accepted_values:
arguments:
values: ['Green', 'Yellow']
- name: total_amount
data_tests:
- not_null
Import
# Install required packages (dbt_utils provides generate_surrogate_key and unique_combination_of_columns)
dbt deps
# Output:
# Installing dbt-labs/dbt_utils
# Installing dbt-labs/codegen
# Updated lock file
I/O Contract
Inputs
| Input | Type | Description |
|---|---|---|
packages.yml |
YAML config | Declares dbt_utils (>=1.3.0,<2.0.0) and codegen (>=0.14.0,<1.0.0) dependencies |
schema.yml files |
YAML config | Test and documentation declarations for all models across all layers |
seeds_properties.yml |
YAML config | Test declarations for seed tables (unique, not_null on payment_type) |
| Built models | Database tables/views | The actual data produced by dbt run, against which tests are executed
|
Outputs
| Output | Type | Description |
|---|---|---|
| Test results | Pass/Fail/Warn | Results of all data_tests across all schema.yml files |
target/catalog.json |
JSON | Generated catalog metadata for all models, sources, and columns |
target/manifest.json |
JSON | Project manifest including lineage graph, compiled SQL, and test definitions |
| Documentation site | Static HTML | Browsable data catalog served by dbt docs serve on localhost:8080
|
dbt_packages/ |
Installed packages | Downloaded dbt_utils and codegen packages after dbt deps
|
Usage Examples
Full test suite execution
# Run all tests across all layers
dbt test
# Output example:
# 14:45:01 1 of 22 PASS not_null_stg_green_tripdata_vendor_id ............. [PASS in 0.12s]
# 14:45:01 2 of 22 PASS not_null_stg_green_tripdata_pickup_datetime ....... [PASS in 0.11s]
# 14:45:02 3 of 22 PASS not_null_stg_yellow_tripdata_vendor_id ............ [PASS in 0.10s]
# 14:45:02 4 of 22 PASS unique_int_trips_trip_id .......................... [PASS in 0.23s]
# 14:45:02 5 of 22 PASS not_null_int_trips_trip_id ....................... [PASS in 0.09s]
# 14:45:03 6 of 22 PASS accepted_values_int_trips_service_type ........... [PASS in 0.08s]
# ...
# 14:45:05 22 of 22 PASS unique_combination_fct_monthly_zone_revenue ...... [PASS in 0.31s]
# Finished running 22 data tests in 0 hours 0 minutes and 4.12 seconds
Selective test execution
# Test only the marts layer
dbt test --select marts
# Test a specific model
dbt test --select fct_trips
# Test only relationship tests
dbt test --select test_type:relationships
# Run tests after a specific model build
dbt build --select fct_trips
Documentation generation and serving
# Generate the documentation catalog
dbt docs generate
# Serve the documentation site locally
dbt docs serve --port 8080
# The site includes:
# - Model lineage DAG (visual graph of all model dependencies)
# - Column-level descriptions for every model
# - Test definitions and results
# - Source freshness status
# - Compiled SQL for each model
Complete test inventory across all layers
LAYER | MODEL | COLUMN | TEST
---------------|----------------------------|-----------------------|-------------------------
staging | stg_green_tripdata | vendor_id | not_null
staging | stg_green_tripdata | pickup_datetime | not_null
staging | stg_yellow_tripdata | vendor_id | not_null
staging | stg_yellow_tripdata | pickup_datetime | not_null
intermediate | int_trips | trip_id | unique
intermediate | int_trips | trip_id | not_null
intermediate | int_trips | vendor_id | not_null
intermediate | int_trips | service_type | not_null
intermediate | int_trips | service_type | accepted_values
intermediate | int_trips | pickup_datetime | not_null
intermediate | int_trips | total_amount | not_null
marts | fct_trips | trip_id | unique
marts | fct_trips | trip_id | not_null
marts | fct_trips | vendor_id | not_null
marts | fct_trips | service_type | accepted_values
marts | fct_trips | service_type | not_null
marts | fct_trips | pickup_location_id | relationships (dim_zones)
marts | fct_trips | dropoff_location_id | relationships (dim_zones)
marts | fct_trips | pickup_datetime | not_null
marts | fct_trips | total_amount | not_null
marts | dim_zones | location_id | unique
marts | dim_zones | location_id | not_null
marts | dim_vendors | vendor_id | unique
marts | dim_vendors | vendor_id | not_null
reporting | fct_monthly_zone_revenue | (model-level) | unique_combination_of_columns
reporting | fct_monthly_zone_revenue | pickup_zone | not_null
reporting | fct_monthly_zone_revenue | revenue_month | not_null
reporting | fct_monthly_zone_revenue | service_type | not_null
reporting | fct_monthly_zone_revenue | service_type | accepted_values
reporting | fct_monthly_zone_revenue | revenue_monthly_total | not_null
reporting | fct_monthly_zone_revenue | total_monthly_trips | not_null
seeds | payment_type_lookup | payment_type | unique
seeds | payment_type_lookup | payment_type | not_null
Related Pages
- Principle:DataTalksClub_Data_engineering_zoomcamp_Dbt_Testing_And_Documentation
- Implementation:DataTalksClub_Data_engineering_zoomcamp_Dbt_Marts_Models
- Implementation:DataTalksClub_Data_engineering_zoomcamp_Dbt_Project_Yml_Config
- Environment:DataTalksClub_Data_engineering_zoomcamp_Dbt_DuckDB_Environment
Page Connections
Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment