Principle:DataTalksClub Data engineering zoomcamp Kestra Table Creation

Metadata
Knowledge Sources	Kestra Flowable Tasks Documentation, PostgreSQL CREATE TABLE Documentation, DDL Reference (Wikipedia)
Domains	Data Modeling, DDL, Schema Management, Conditional Logic
Last Updated	2026-02-09 14:00 GMT

Overview

Dynamic DDL generation creates database tables with schemas that adapt based on input parameters, using conditional branching to determine which schema definition to apply.

Description

Data pipelines often ingest multiple dataset variants that share a common structure but differ in specific columns. Rather than maintaining separate pipelines for each variant, dynamic DDL generation uses conditional branching within the orchestration flow to select the appropriate schema at runtime.

The pattern involves three key elements:

Conditional branching -- an If/Else construct evaluates an input parameter and directs execution to the branch containing the correct schema definition. For example, a taxi type input of "yellow" routes to a 20-column schema while "green" routes to a 22-column schema (which includes additional fields such as ehail_fee and trip_type).
Idempotent DDL -- the CREATE TABLE IF NOT EXISTS statement ensures that table creation is safe to re-run. If the table already exists, the statement is a no-op, making the pipeline idempotent from the schema perspective.
Staging table pattern -- for each production table, a corresponding staging table with an identical schema is created. The staging table serves as a temporary landing zone for incoming data before it is validated and merged into the production table.

Template expressions in the DDL statements dynamically resolve table names based on input parameters, so a single flow definition can manage tables for any supported dataset variant.

Usage

Use dynamic DDL generation when:

A pipeline supports multiple dataset variants that share a base structure but differ in specific columns.
Table creation must be idempotent to support pipeline re-execution without errors.
A staging-and-production table pattern is needed for safe data loading with deduplication.
Schema definitions should be co-located with the pipeline logic rather than managed externally.

Theoretical Basis

GIVEN input: data_type (one of variant_A, variant_B)

IF data_type == variant_A THEN
    schema = [col1 type, col2 type, ..., colN type]   -- variant A columns
ELSE IF data_type == variant_B THEN
    schema = [col1 type, col2 type, ..., colM type]   -- variant B columns (M != N)

FOR EACH target IN [production_table, staging_table]:
    EXECUTE:
        CREATE TABLE IF NOT EXISTS {target} (
            unique_row_id  text,       -- deduplication key (populated later)
            filename       text,       -- lineage tracking (populated later)
            {schema}
        )

-- Tables now exist with the correct schema for the selected data variant
-- Staging table is truncated before each load to ensure a clean landing zone
EXECUTE:
    TRUNCATE TABLE {staging_table}

The unique_row_id and filename columns are present in all schema variants as infrastructure columns: they support deduplication and data lineage tracking respectively, and are populated in later pipeline stages rather than during the initial data load.

Related Pages

Implementation:DataTalksClub_Data_engineering_zoomcamp_Kestra_PostgreSQL_Queries_DDL

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment