Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Principle:TobikoData Sqlmesh Forward Only Change Handling

From Leeroopedia


Knowledge Sources
Domains Data_Engineering, Incremental_Processing
Last Updated 2026-02-07 00:00 GMT

Overview

Manage incremental model changes that apply only to future time intervals without reprocessing historical data, preventing breaking changes to established datasets.

Description

Forward-only change handling enables data engineers to modify incremental model logic while preserving historical data integrity. When changes are marked as forward-only, the new transformation logic applies exclusively to future intervals, leaving already-processed historical intervals unchanged. This approach is critical for maintaining stable historical reporting while evolving data pipelines.

The system distinguishes between breaking changes (which alter data semantics or schema) and non-breaking changes (which optimize performance or add optional columns). For forward-only models, breaking changes are typically prevented or require explicit approval, while non-breaking changes can be applied automatically.

Forward-only mode creates a temporal boundary at the effective_from date. Intervals before this date continue using the previous model version, while subsequent intervals use the new logic. This allows gradual migration strategies and supports scenarios where historical reprocessing is legally prohibited, computationally infeasible, or would violate audit requirements.

Usage

Use forward-only change handling when modifying incremental models in production environments where historical data must remain stable. This is essential for financial reporting systems subject to audit requirements, compliance scenarios where historical calculations cannot be retroactively changed, or performance optimizations that should not trigger expensive backfills.

Apply forward-only mode when adding new columns or refining logic that doesn't affect existing downstream consumers. Specify the effective_from date to control exactly when the new logic begins applying, enabling coordinated deployments with downstream systems.

Configure allow_destructive_models and allow_additive_models lists to explicitly permit specific models to have schema changes, providing escape hatches for scenarios where controlled schema evolution is acceptable.

Theoretical Basis

Forward-only change handling implements temporal model versioning:

MODEL_VERSION_HISTORY:
  versions = list of (fingerprint, effective_from_date, effective_to_date)

CHANGE_CLASSIFICATION:
  old_model = load_model_version(current_fingerprint)
  new_model = proposed_model_changes

  diff = compute_diff(old_model, new_model)

  change_type = classify(diff):
    CASE schema_incompatible(diff):
      RETURN "DESTRUCTIVE"  // Column removed, type changed incompatibly
    CASE schema_extended(diff):
      RETURN "ADDITIVE"     // Column added, nullable constraints relaxed
    CASE logic_only(diff):
      RETURN "NON_BREAKING" // Query logic changed, schema identical

FORWARD_ONLY_VALIDATION:
  IF forward_only_mode AND change_type == "DESTRUCTIVE" THEN
    IF model NOT IN allow_destructive_models THEN
      RAISE error("Destructive change not allowed in forward-only mode")

  IF forward_only_mode AND change_type == "ADDITIVE" THEN
    IF model NOT IN allow_additive_models THEN
      WARN("Additive change detected, may affect downstream consumers")

PLAN_GENERATION with forward_only:
  effective_boundary = effective_from OR plan_execution_time

  historical_intervals = intervals WHERE end <= effective_boundary
  future_intervals = intervals WHERE start > effective_boundary

  FOR interval in historical_intervals:
    use_model_version(old_fingerprint)
    IF interval NOT in completion_state THEN
      process_with_old_logic(interval)

  FOR interval in future_intervals:
    use_model_version(new_fingerprint)
    process_with_new_logic(interval)

QUERY_ROUTING at runtime:
  requested_interval = user_query.time_range

  IF requested_interval overlaps effective_boundary THEN
    historical_data = query(old_physical_table,
                           interval.start, effective_boundary)
    future_data = query(new_physical_table,
                       effective_boundary, interval.end)
    RETURN union(historical_data, future_data)
  ELSE
    SELECT appropriate_version based on requested_interval

The system maintains multiple physical versions of the same logical model, using temporal predicates to route queries to the correct version based on time range. This ensures that queries spanning the effective_from boundary receive consistent results while allowing independent evolution of future data processing.

Key guarantees:

Historical Immutability: Data processed before effective_from never changes due to model updates.

Downstream Compatibility: Queries against historical data return identical results regardless of forward-only changes.

Gradual Migration: Teams can validate new logic on recent data before deciding whether to reprocess history.

Audit Compliance: Satisfies requirements for maintaining historical calculations unchanged.

Related Pages

Implemented By

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment