Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Workflow:TobikoData Sqlmesh Incremental model development

From Leeroopedia


Knowledge Sources
Domains Data_Engineering, SQL_Transformation, Incremental_Processing
Last Updated 2026-02-07 21:00 GMT

Overview

End-to-end process for building, testing, and deploying time-based incremental SQL models that process only new or changed data intervals, minimizing compute costs for large datasets.

Description

This workflow covers the development of incremental models in SQLMesh, focusing on the INCREMENTAL_BY_TIME_RANGE and INCREMENTAL_BY_UNIQUE_KEY model kinds. Incremental models are the primary mechanism for handling large datasets efficiently by only processing new time intervals or changed records rather than rebuilding entire tables. The workflow covers model definition with proper time column configuration, interval tracking, forward-only changes, and the automatic detection of missing data intervals.

Key outcomes:

  • An incremental model that processes only new data on each run
  • Automatic interval tracking to detect and fill gaps
  • Forward-only change support for non-breaking modifications
  • Efficient backfill with configurable date ranges

Usage

Execute this workflow when working with large time-series or event data where full table rebuilds would be prohibitively expensive. This applies to datasets with append-only or slowly-changing patterns where processing can be bounded by time ranges.

Execution Steps

Step 1: Incremental Model Definition

Define a SQL model with an incremental kind specification. For time-range models, specify the time_column that partitions the data and the cron schedule for automatic execution. The model query uses the @start_ds and @end_ds macro variables to filter input data to the current processing interval.

Key considerations:

  • Choose INCREMENTAL_BY_TIME_RANGE for time-partitioned data
  • Choose INCREMENTAL_BY_UNIQUE_KEY for upsert patterns
  • The time_column must be present in the model output
  • Use @start_ds and @end_ds macros to bound the query

Step 2: Interval Configuration

Configure the model's scheduling and interval behavior including the cron expression, lookback window, and batch size. The cron defines how frequently new intervals are created. The lookback adds extra intervals behind the current one for late-arriving data. Batch size controls how many intervals are processed in a single query execution.

Key considerations:

  • Cron expressions define interval granularity (hourly, daily, weekly)
  • Lookback handles late-arriving data by reprocessing recent intervals
  • Batch size trades off between query complexity and execution frequency
  • Auto-restatement can be configured to periodically refresh recent data

Step 3: Development Environment Testing

Create a development environment with a limited date range to test the incremental model with a subset of data. This validates the model logic without processing the full historical range. The development plan shows which intervals will be backfilled.

Key considerations:

  • Limit the backfill date range to reduce development iteration time
  • Verify that the time column filter correctly bounds the data
  • Check that the model produces expected results for the test interval
  • Validate that upstream dependencies are properly resolved

Step 4: Forward Only Change Handling

When modifying an existing incremental model with non-breaking changes, use forward-only mode to apply changes only to future intervals without reprocessing historical data. This is appropriate for additive changes like adding new columns or adjusting non-critical logic.

Key considerations:

  • Forward-only changes do not trigger historical backfill
  • The change must be categorized as non-breaking
  • Historical data retains the old logic; new intervals use the updated logic
  • Breaking changes require full backfill regardless of the forward-only flag

Step 5: Production Deployment with Backfill

Deploy the incremental model to production with the appropriate backfill strategy. For new models, specify the start date for historical backfill. For modified models, the system automatically determines which intervals need reprocessing based on the change type and interval tracking.

Key considerations:

  • New models require an explicit start date for initial backfill
  • Modified models only reprocess invalidated intervals
  • Backfill runs in topological order respecting dependencies
  • Progress tracking shows interval-level completion status

Step 6: Ongoing Interval Processing

After initial deployment, the scheduler automatically detects missing intervals based on the cron schedule and triggers processing for new data. The interval tracking system maintains a record of all processed intervals and identifies gaps that need filling.

Key considerations:

  • The scheduler runs periodically to detect new intervals
  • Missing intervals from gaps or failures are automatically detected
  • The run command processes all pending intervals in dependency order
  • Monitoring shows interval completion status per model

Execution Diagram

GitHub URL

Workflow Repository