Heuristic:TobikoData Sqlmesh Model Change Categorization
| Knowledge Sources | |
|---|---|
| Domains | Plan_Building, Change_Management |
| Last Updated | 2026-02-07 21:00 GMT |
Overview
SQLMesh automatically categorizes model changes as breaking vs non-breaking using AST comparison rather than string comparison to determine rebuild requirements.
Description
When a model changes, SQLMesh must determine whether the change requires rebuilding downstream dependencies (breaking) or only the model itself (non-breaking). The categorize_change function in sqlmesh/core/snapshot/categorizer.py performs this classification by comparing two snapshot versions. It checks if the model kind changed (using is_breaking_kind_change), whether only metadata changed (non-breaking), and ultimately falls back to AST (Abstract Syntax Tree) comparison via model.is_breaking_change(). If the system cannot determine the category automatically, it returns None and prompts the user for manual classification, unless auto_categorization_enabled is configured for CI/CD environments.
Usage
This categorization applies when:
- Creating a plan after modifying model SQL or configuration
- Running in CI/CD where interactive prompts are not possible (use auto_categorization_enabled)
- Understanding why SQLMesh requires rebuilding certain models
- Debugging unexpected breaking change classifications
- Configuring categorizer thresholds for specific project needs
The Insight (Rule of Thumb)
- Action: Use AST-based comparison for semantic change detection; configure auto_categorize_changes for CI/CD; manually classify when AST diff is ambiguous
- Value: auto_categorization_enabled flag, categorizer_config thresholds
- Trade-off: Auto-categorization is conservative - may classify complex non-breaking changes as breaking to ensure safety
Reasoning
String-based comparison of SQL queries would be too brittle - formatting changes, comment additions, or variable renames would all appear as changes even though they don't affect query semantics. SQLMesh leverages SQLGlot's AST representation to understand the semantic structure of queries.
The categorization logic follows a decision tree:
1. **Kind Change Check**: If the model kind changes (e.g., FULL to INCREMENTAL_BY_TIME_RANGE), use is_breaking_kind_change to determine if this specific transition is breaking 2. **Metadata-Only Check**: If only metadata (descriptions, tags, etc.) changed but not the query or configuration, classify as NON_BREAKING 3. **AST Comparison**: Compare the query ASTs using SQLGlot to detect semantic differences:
- Adding a column → typically NON_BREAKING - Changing column type → typically BREAKING - Removing a column → typically BREAKING - Complex transformations → may return None (user prompt)
4. **Fallback**: If AST comparison is inconclusive, return None to prompt user (or apply auto-categorization rules)
For CI/CD environments where interactive prompts are impossible, the auto_categorize_changes configuration ensures all changes receive automatic classification, even if conservative. The categorizer_config provides fine-grained control over classification thresholds and rules.
Code Evidence
# sqlmesh/core/snapshot/categorizer.py:16-70
# categorize_change function compares two snapshots
# Returns: SnapshotChangeCategory or None
def categorize_change(
current: Snapshot,
previous: Snapshot,
config: CategorizerConfig
) -> t.Optional[SnapshotChangeCategory]:
# 1. Check if kind changed
if current.model.kind != previous.model.kind:
if is_breaking_kind_change(previous.model.kind, current.model.kind):
return SnapshotChangeCategory.BREAKING
return SnapshotChangeCategory.NON_BREAKING
# 2. Check if only metadata changed
if only_metadata_changed(current, previous):
return SnapshotChangeCategory.NON_BREAKING
# 3. Use AST comparison
if current.model.is_breaking_change(previous.model):
return SnapshotChangeCategory.BREAKING
# 4. Fallback to None (user prompt needed)
return None
# sqlmesh/core/plan/builder.py:83
# PlanBuilder configuration
auto_categorization_enabled: bool = False # Enable for CI/CD
categorizer_config: CategorizerConfig # Fine-grained control
Example change scenarios:
-- Scenario 1: Adding column (NON_BREAKING)
-- Before:
SELECT user_id, email FROM users
-- After:
SELECT user_id, email, created_at FROM users
-- Scenario 2: Changing column type (BREAKING)
-- Before:
SELECT user_id::int FROM users
-- After:
SELECT user_id::varchar FROM users
-- Scenario 3: Removing column (BREAKING)
-- Before:
SELECT user_id, email, phone FROM users
-- After:
SELECT user_id, email FROM users
Configuration for CI/CD:
# config.yaml
auto_categorize_changes:
external: full # Auto-categorize all external model changes
sql: full # Auto-categorize all SQL changes
python: full # Auto-categorize all Python model changes