Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Principle:DataTalksClub Data engineering zoomcamp Dlt Resource Definition

From Leeroopedia


Page Metadata
Knowledge Sources dlt docs: dlt Documentation, Python generators: Python Functional Programming HOWTO
Domains Data_Engineering, Data_Ingestion
Last Updated 2026-02-09 14:00 GMT

Overview

Declarative data resource definition is the practice of using decorator-annotated generator functions to define data extraction logic, allowing the framework to handle schema inference, batching, state management, and destination-specific serialization automatically.

Description

In modern data loading frameworks, a resource is the fundamental unit of data extraction. Rather than writing imperative code that manually manages connections, batching, serialization, and error handling, the developer declares a generator function that yields data records. A decorator on the function provides metadata -- such as the resource name and write disposition -- that tells the framework how to manage the data lifecycle.

This declarative approach offers several benefits:

  • Separation of extraction and loading -- The generator function focuses solely on producing data. The framework takes responsibility for consuming that data, inferring schemas, creating destination tables, and managing write modes.
  • Write disposition control -- The decorator specifies whether the resource should replace the destination table entirely, append new records, or perform a merge based on primary keys. This is declared once and applied consistently.
  • Schema inference -- The framework inspects the yielded records (whether dictionaries, dataclass instances, or columnar structures like PyArrow Tables) and automatically infers the destination schema, including column names, data types, and nullability.
  • Lazy evaluation -- Because the function is a generator, data is produced on demand rather than loaded entirely into memory. This enables processing datasets that are larger than available RAM.
  • Composability -- Resources can be composed using pipe operators to chain transformations. For example, a filesystem reader can be piped into a Parquet parser to create a compound resource.

The generator can yield data in multiple formats depending on the source:

  • Row-by-row -- Yielding individual dictionaries or objects, suitable for API responses or row-oriented sources
  • Columnar batches -- Yielding PyArrow Tables or similar columnar structures, suitable for file-based sources where data is naturally batched

Usage

Use declarative resource definition when:

  • Building data extraction logic that should be decoupled from destination-specific loading details
  • The framework should automatically manage schema creation and evolution
  • Data is naturally produced as a stream of records or batches
  • Write disposition (replace, append, merge) should be configured declaratively rather than coded imperatively
  • The extraction logic needs to be composable with other transformations

Theoretical Basis

The declarative resource pattern follows this conceptual structure:

DECORATOR resource(name, write_disposition):
    -- Registers the function as a named data source
    -- Configures how the destination table is managed

FUNCTION extract_data(sources):
    FOR EACH source IN sources:
        data = fetch_from_source(source)
        IF data is valid:
            YIELD data
    -- Framework automatically:
    --   1. Infers schema from first yielded batch
    --   2. Creates or updates destination table
    --   3. Applies write disposition (replace/append/merge)
    --   4. Tracks load state for incremental loading

The decorator pattern transforms a plain generator function into a framework-managed resource. The framework wraps the generator with additional behavior: it intercepts each yielded value, inspects its structure for schema inference, batches records for efficient loading, and applies the declared write disposition when writing to the destination.

The yield keyword is central to this pattern. Unlike return, which produces a single value and terminates the function, yield produces a value and suspends the function's state, allowing the framework to process each batch before requesting the next one. This creates a pull-based data flow where the framework controls the pace of data extraction.

When resources are composed (e.g., using pipe operators), the output of one generator becomes the input of another, forming a transformation pipeline within the resource definition itself. This is distinct from the outer pipeline that manages loading -- it is a chain of transformations applied during extraction.

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment