Principle:TobikoData Sqlmesh Incremental Model Definition
| Knowledge Sources | |
|---|---|
| Domains | Data_Engineering, Incremental_Processing |
| Last Updated | 2026-02-07 00:00 GMT |
Overview
Define data transformation models that process only new or changed time-based intervals rather than reprocessing entire datasets.
Description
Incremental model definition enables data engineers to specify transformation logic that only operates on data within specific time windows. Instead of recalculating the entire dataset on each run, the system identifies which time intervals contain new or modified data and processes only those intervals. This approach dramatically reduces compute costs and processing time for large datasets where only a small fraction of data changes between runs.
The incremental by time range strategy partitions data along a time dimension, tracking which intervals have been successfully processed and storing this state for future runs. When new data arrives or historical corrections are needed, only the affected intervals are recomputed.
Usage
Use incremental model definition when working with time-series data, event streams, or any dataset that grows continuously over time. This pattern is essential for ETL pipelines processing logs, sensor readings, financial transactions, user activity, or other temporally-organized data where reprocessing the entire history would be prohibitively expensive.
Incremental models are particularly valuable when downstream dependencies need to respond to upstream changes efficiently, as the system can automatically cascade updates through only the affected time intervals.
Theoretical Basis
The core concept relies on maintaining a mapping between time intervals and processing state:
FOR each model in DAG:
DEFINE time_column as temporal partition key
DEFINE grain as interval size (hourly, daily, etc.)
STATE = map of (interval_start, interval_end) -> completion_status
ON execution:
missing_intervals = compute_missing_intervals(STATE, target_start, target_end)
FOR each interval in missing_intervals:
IF interval meets dependencies THEN
filtered_data = WHERE time_column >= interval.start
AND time_column < interval.end
result = apply_transformation(filtered_data)
materialize(result, interval)
mark_complete(STATE, interval)
The system uses fingerprinting to track model definition changes. When transformation logic changes, a new version is created, and the state mapping determines which intervals need reprocessing under the new logic.
Key design principles:
Immutable Intervals: Once an interval is successfully processed, it remains stable unless explicitly restated.
Dependency-Aware Scheduling: Intervals are processed in topological order, respecting dependencies between models.
Lookback Support: Models can access data from prior intervals to support windowed operations (moving averages, cumulative calculations).
Forward-Only Mode: Changes can be applied only to future intervals, preserving historical data integrity.