Heuristic:TobikoData Sqlmesh Model Change Categorization

Knowledge Sources	TobikoData/sqlmesh
Domains	Plan_Building, Change_Management
Last Updated	2026-02-07 21:00 GMT

Overview

SQLMesh automatically categorizes model changes as breaking vs non-breaking using AST comparison rather than string comparison to determine rebuild requirements.

Description

When a model changes, SQLMesh must determine whether the change requires rebuilding downstream dependencies (breaking) or only the model itself (non-breaking). The categorize_change function in sqlmesh/core/snapshot/categorizer.py performs this classification by comparing two snapshot versions. It checks if the model kind changed (using is_breaking_kind_change), whether only metadata changed (non-breaking), and ultimately falls back to AST (Abstract Syntax Tree) comparison via model.is_breaking_change(). If the system cannot determine the category automatically, it returns None and prompts the user for manual classification, unless auto_categorization_enabled is configured for CI/CD environments.

Usage

This categorization applies when:

Creating a plan after modifying model SQL or configuration
Running in CI/CD where interactive prompts are not possible (use auto_categorization_enabled)
Understanding why SQLMesh requires rebuilding certain models
Debugging unexpected breaking change classifications
Configuring categorizer thresholds for specific project needs

The Insight (Rule of Thumb)

Action: Use AST-based comparison for semantic change detection; configure auto_categorize_changes for CI/CD; manually classify when AST diff is ambiguous
Value: auto_categorization_enabled flag, categorizer_config thresholds
Trade-off: Auto-categorization is conservative - may classify complex non-breaking changes as breaking to ensure safety

Reasoning

String-based comparison of SQL queries would be too brittle - formatting changes, comment additions, or variable renames would all appear as changes even though they don't affect query semantics. SQLMesh leverages SQLGlot's AST representation to understand the semantic structure of queries.

The categorization logic follows a decision tree:

1. **Kind Change Check**: If the model kind changes (e.g., FULL to INCREMENTAL_BY_TIME_RANGE), use is_breaking_kind_change to determine if this specific transition is breaking 2. **Metadata-Only Check**: If only metadata (descriptions, tags, etc.) changed but not the query or configuration, classify as NON_BREAKING 3. **AST Comparison**: Compare the query ASTs using SQLGlot to detect semantic differences:

  - Adding a column → typically NON_BREAKING
  - Changing column type → typically BREAKING
  - Removing a column → typically BREAKING
  - Complex transformations → may return None (user prompt)

4. **Fallback**: If AST comparison is inconclusive, return None to prompt user (or apply auto-categorization rules)

For CI/CD environments where interactive prompts are impossible, the auto_categorize_changes configuration ensures all changes receive automatic classification, even if conservative. The categorizer_config provides fine-grained control over classification thresholds and rules.

Code Evidence

# sqlmesh/core/snapshot/categorizer.py:16-70

# categorize_change function compares two snapshots
# Returns: SnapshotChangeCategory or None

def categorize_change(
    current: Snapshot,
    previous: Snapshot,
    config: CategorizerConfig
) -> t.Optional[SnapshotChangeCategory]:
    # 1. Check if kind changed
    if current.model.kind != previous.model.kind:
        if is_breaking_kind_change(previous.model.kind, current.model.kind):
            return SnapshotChangeCategory.BREAKING
        return SnapshotChangeCategory.NON_BREAKING

    # 2. Check if only metadata changed
    if only_metadata_changed(current, previous):
        return SnapshotChangeCategory.NON_BREAKING

    # 3. Use AST comparison
    if current.model.is_breaking_change(previous.model):
        return SnapshotChangeCategory.BREAKING

    # 4. Fallback to None (user prompt needed)
    return None

# sqlmesh/core/plan/builder.py:83

# PlanBuilder configuration
auto_categorization_enabled: bool = False  # Enable for CI/CD
categorizer_config: CategorizerConfig  # Fine-grained control

Example change scenarios:

-- Scenario 1: Adding column (NON_BREAKING)
-- Before:
SELECT user_id, email FROM users

-- After:
SELECT user_id, email, created_at FROM users

-- Scenario 2: Changing column type (BREAKING)
-- Before:
SELECT user_id::int FROM users

-- After:
SELECT user_id::varchar FROM users

-- Scenario 3: Removing column (BREAKING)
-- Before:
SELECT user_id, email, phone FROM users

-- After:
SELECT user_id, email FROM users

Configuration for CI/CD:

# config.yaml
auto_categorize_changes:
  external: full  # Auto-categorize all external model changes
  sql: full       # Auto-categorize all SQL changes
  python: full    # Auto-categorize all Python model changes

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment