Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Environment:DataTalksClub Data engineering zoomcamp Dbt DuckDB Environment

From Leeroopedia


Knowledge Sources
Domains Analytics_Engineering, Data_Transformation
Last Updated 2026-02-09 07:00 GMT

04-analytics-engineering/taxi_rides_ny/dbt_project.yml 04-analytics-engineering/taxi_rides_ny/packages.yml

Overview

Local analytics engineering environment with dbt Core (>=1.7.0, <2.0.0), DuckDB, and the dbt-duckdb adapter for transforming NYC taxi data.

Description

This environment provides a local-first analytics engineering setup using dbt Core with DuckDB as the in-process database engine. DuckDB runs inside the host process memory (no separate server), making it ideal for local development but requiring careful memory management. The project uses dbt packages (dbt_utils, codegen) and processes tens of millions of NYC taxi trip records across staging, intermediate, and marts layers.

Usage

Use this environment for any analytics transformation workflow using dbt. It is the mandatory prerequisite for running the Dbt_Project_Yml_Config, Dbt_Sources_Yml, Dbt_Staging_Models, Dbt_Intermediate_Models, Dbt_Marts_Models, and Dbt_Test_Docs_CLI implementations.

System Requirements

Category Requirement Notes
OS Linux, macOS, or Windows No Docker required for local setup
RAM 8GB minimum (16GB recommended) DuckDB is an in-process database; 4GB will likely cause OOM
Disk ~5GB free (SSD recommended) DuckDB spills to disk; SSD is much faster for spill operations
Python Python 3.8+ Required for dbt-core and dbt-duckdb
Network Internet access Downloads NYC taxi data from GitHub releases

Dependencies

Python Packages

  • `dbt-duckdb` (installs both dbt-core and the DuckDB adapter)

dbt Version Constraint

From `dbt_project.yml`:

  • `dbt-core` >= 1.7.0 and < 2.0.0

dbt Packages

From `packages.yml`:

  • `dbt-labs/dbt_utils` >= 1.3.0 and < 2.0.0 (locked at 1.3.3)
  • `dbt-labs/codegen` >= 0.14.0 and < 1.0.0 (locked at 0.14.0)

Optional Tools

  • DuckDB CLI (for direct database inspection)
  • VS Code with dbt Power User extension (for enhanced IDE support)

Credentials

No external credentials required for the local DuckDB setup. The database is a local file (`taxi_rides_ny.duckdb`).

For the alternative Cloud Setup (BigQuery) path:

  • `GOOGLE_APPLICATION_CREDENTIALS`: Path to GCP service account JSON file

Quick Install

# Install dbt with DuckDB adapter
pip install dbt-duckdb

# Install dbt packages
cd 04-analytics-engineering/taxi_rides_ny
dbt deps

# Verify connection
dbt debug

Code Evidence

dbt version constraint from `dbt_project.yml:5`:

require-dbt-version: [">=1.7.0", "<2.0.0"]

Package dependencies from `packages.yml:1-5`:

packages:
  - package: dbt-labs/dbt_utils
    version: [">=1.3.0", "<2.0.0"]
  - package: dbt-labs/codegen
    version: [">=0.14.0", "<1.0.0"]

Recommended DuckDB profile settings from `local_setup.md:48-80`:

taxi_rides_ny:
  target: dev
  outputs:
    dev:
      type: duckdb
      path: taxi_rides_ny.duckdb
      schema: dev
      threads: 1
      extensions:
        - parquet
      settings:
        memory_limit: '2GB'
        preserve_insertion_order: false

Common Errors

Error Message Cause Solution
`Out of Memory` during `dbt build` DuckDB exceeds available RAM Set `memory_limit` to 50% of total RAM in `profiles.yml`; reduce `threads` to 1
`dbt version check failed` dbt version outside >=1.7.0, <2.0.0 range Install compatible version: `pip install "dbt-duckdb>=1.7,<2.0"`
`Package not found: dbt_utils` dbt packages not installed Run `dbt deps` to install packages

Compatibility Notes

  • DuckDB is in-process: Unlike PostgreSQL or BigQuery, DuckDB runs inside your computer's RAM. Machines with less than 8GB RAM may experience OOM errors on large models.
  • SSD recommended: DuckDB spills to disk when memory is exhausted. An SSD makes spill operations significantly faster than an HDD.
  • Docker overhead: Running dbt+DuckDB inside Docker containers adds memory overhead. If possible, run directly on the host to maximize available RAM for DuckDB.
  • Alternative cloud path: If local RAM is insufficient, use the BigQuery cloud setup instead, which offloads computation to Google servers.

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment