Principle:TobikoData Sqlmesh Incremental Model Definition

Knowledge Sources	SQLMesh SQLMesh Docs
Domains	Data_Engineering, Incremental_Processing
Last Updated	2026-02-07 00:00 GMT

Overview

Define data transformation models that process only new or changed time-based intervals rather than reprocessing entire datasets.

Description

Incremental model definition enables data engineers to specify transformation logic that only operates on data within specific time windows. Instead of recalculating the entire dataset on each run, the system identifies which time intervals contain new or modified data and processes only those intervals. This approach dramatically reduces compute costs and processing time for large datasets where only a small fraction of data changes between runs.

The incremental by time range strategy partitions data along a time dimension, tracking which intervals have been successfully processed and storing this state for future runs. When new data arrives or historical corrections are needed, only the affected intervals are recomputed.

Usage

Use incremental model definition when working with time-series data, event streams, or any dataset that grows continuously over time. This pattern is essential for ETL pipelines processing logs, sensor readings, financial transactions, user activity, or other temporally-organized data where reprocessing the entire history would be prohibitively expensive.

Incremental models are particularly valuable when downstream dependencies need to respond to upstream changes efficiently, as the system can automatically cascade updates through only the affected time intervals.

Theoretical Basis

The core concept relies on maintaining a mapping between time intervals and processing state:

FOR each model in DAG:
  DEFINE time_column as temporal partition key
  DEFINE grain as interval size (hourly, daily, etc.)

  STATE = map of (interval_start, interval_end) -> completion_status

  ON execution:
    missing_intervals = compute_missing_intervals(STATE, target_start, target_end)

    FOR each interval in missing_intervals:
      IF interval meets dependencies THEN
        filtered_data = WHERE time_column >= interval.start
                          AND time_column < interval.end
        result = apply_transformation(filtered_data)
        materialize(result, interval)
        mark_complete(STATE, interval)

The system uses fingerprinting to track model definition changes. When transformation logic changes, a new version is created, and the state mapping determines which intervals need reprocessing under the new logic.

Key design principles:

Immutable Intervals: Once an interval is successfully processed, it remains stable unless explicitly restated.

Dependency-Aware Scheduling: Intervals are processed in topological order, respecting dependencies between models.

Lookback Support: Models can access data from prior intervals to support windowed operations (moving averages, cumulative calculations).

Forward-Only Mode: Changes can be applied only to future intervals, preserving historical data integrity.

Related Pages

Implemented By

Implementation:TobikoData_Sqlmesh_IncrementalByTimeRangeKind_Init

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment