Environment:DataTalksClub Data engineering zoomcamp Dbt DuckDB Environment
| Knowledge Sources | |
|---|---|
| Domains | Analytics_Engineering, Data_Transformation |
| Last Updated | 2026-02-09 07:00 GMT |
04-analytics-engineering/taxi_rides_ny/dbt_project.yml 04-analytics-engineering/taxi_rides_ny/packages.yml
Overview
Local analytics engineering environment with dbt Core (>=1.7.0, <2.0.0), DuckDB, and the dbt-duckdb adapter for transforming NYC taxi data.
Description
This environment provides a local-first analytics engineering setup using dbt Core with DuckDB as the in-process database engine. DuckDB runs inside the host process memory (no separate server), making it ideal for local development but requiring careful memory management. The project uses dbt packages (dbt_utils, codegen) and processes tens of millions of NYC taxi trip records across staging, intermediate, and marts layers.
Usage
Use this environment for any analytics transformation workflow using dbt. It is the mandatory prerequisite for running the Dbt_Project_Yml_Config, Dbt_Sources_Yml, Dbt_Staging_Models, Dbt_Intermediate_Models, Dbt_Marts_Models, and Dbt_Test_Docs_CLI implementations.
System Requirements
| Category | Requirement | Notes |
|---|---|---|
| OS | Linux, macOS, or Windows | No Docker required for local setup |
| RAM | 8GB minimum (16GB recommended) | DuckDB is an in-process database; 4GB will likely cause OOM |
| Disk | ~5GB free (SSD recommended) | DuckDB spills to disk; SSD is much faster for spill operations |
| Python | Python 3.8+ | Required for dbt-core and dbt-duckdb |
| Network | Internet access | Downloads NYC taxi data from GitHub releases |
Dependencies
Python Packages
- `dbt-duckdb` (installs both dbt-core and the DuckDB adapter)
dbt Version Constraint
From `dbt_project.yml`:
- `dbt-core` >= 1.7.0 and < 2.0.0
dbt Packages
From `packages.yml`:
- `dbt-labs/dbt_utils` >= 1.3.0 and < 2.0.0 (locked at 1.3.3)
- `dbt-labs/codegen` >= 0.14.0 and < 1.0.0 (locked at 0.14.0)
Optional Tools
- DuckDB CLI (for direct database inspection)
- VS Code with dbt Power User extension (for enhanced IDE support)
Credentials
No external credentials required for the local DuckDB setup. The database is a local file (`taxi_rides_ny.duckdb`).
For the alternative Cloud Setup (BigQuery) path:
- `GOOGLE_APPLICATION_CREDENTIALS`: Path to GCP service account JSON file
Quick Install
# Install dbt with DuckDB adapter
pip install dbt-duckdb
# Install dbt packages
cd 04-analytics-engineering/taxi_rides_ny
dbt deps
# Verify connection
dbt debug
Code Evidence
dbt version constraint from `dbt_project.yml:5`:
require-dbt-version: [">=1.7.0", "<2.0.0"]
Package dependencies from `packages.yml:1-5`:
packages:
- package: dbt-labs/dbt_utils
version: [">=1.3.0", "<2.0.0"]
- package: dbt-labs/codegen
version: [">=0.14.0", "<1.0.0"]
Recommended DuckDB profile settings from `local_setup.md:48-80`:
taxi_rides_ny:
target: dev
outputs:
dev:
type: duckdb
path: taxi_rides_ny.duckdb
schema: dev
threads: 1
extensions:
- parquet
settings:
memory_limit: '2GB'
preserve_insertion_order: false
Common Errors
| Error Message | Cause | Solution |
|---|---|---|
| `Out of Memory` during `dbt build` | DuckDB exceeds available RAM | Set `memory_limit` to 50% of total RAM in `profiles.yml`; reduce `threads` to 1 |
| `dbt version check failed` | dbt version outside >=1.7.0, <2.0.0 range | Install compatible version: `pip install "dbt-duckdb>=1.7,<2.0"` |
| `Package not found: dbt_utils` | dbt packages not installed | Run `dbt deps` to install packages |
Compatibility Notes
- DuckDB is in-process: Unlike PostgreSQL or BigQuery, DuckDB runs inside your computer's RAM. Machines with less than 8GB RAM may experience OOM errors on large models.
- SSD recommended: DuckDB spills to disk when memory is exhausted. An SSD makes spill operations significantly faster than an HDD.
- Docker overhead: Running dbt+DuckDB inside Docker containers adds memory overhead. If possible, run directly on the host to maximize available RAM for DuckDB.
- Alternative cloud path: If local RAM is insufficient, use the BigQuery cloud setup instead, which offloads computation to Google servers.
Related Pages
- Implementation:DataTalksClub_Data_engineering_zoomcamp_Dbt_Project_Yml_Config
- Implementation:DataTalksClub_Data_engineering_zoomcamp_Dbt_Sources_Yml
- Implementation:DataTalksClub_Data_engineering_zoomcamp_Dbt_Staging_Models
- Implementation:DataTalksClub_Data_engineering_zoomcamp_Dbt_Intermediate_Models
- Implementation:DataTalksClub_Data_engineering_zoomcamp_Dbt_Marts_Models
- Implementation:DataTalksClub_Data_engineering_zoomcamp_Dbt_Test_Docs_CLI