Implementation:TobikoData Sqlmesh Context Table Diff
| Knowledge Sources | |
|---|---|
| Domains | Data_Engineering, Environment_Management |
| Last Updated | 2026-02-07 00:00 GMT |
Overview
Concrete tool for comparing data between environments or tables provided by SQLMesh.
Description
The Context.table_diff() method performs comprehensive comparison between two data sources, which can be either environment names or direct table references. When environments are provided, it compares the corresponding models across those environments. The comparison operates at multiple levels: schema differences (added/removed/changed columns), aggregate statistics (row counts, null counts, distinct values), and row-level differences (specific rows that differ based on grain/primary key).
The method supports flexible configuration including join conditions, column filtering, row limits, floating-point precision control, and grain validation. It can operate across models when select_models is provided, comparing multiple tables in a single operation. Results are returned as TableDiff objects and optionally displayed in the console with formatted output showing schema changes and sample differing rows.
Usage
Use Context.table_diff() when you need to:
- Validate that development changes produce expected results before merging to production
- Compare staging environment to production for deployment verification
- Regression test after refactoring to ensure output hasn't changed
- Debug data discrepancies by identifying specific differing rows
- Build automated data quality checks in CI/CD pipelines
- Verify forward-only model changes produce identical results in preview mode
- Document expected data changes as part of code review
- Troubleshoot issues by comparing known-good and problematic environments
Code Reference
Source Location
- Repository: sqlmesh
- File: sqlmesh/core/context.py:L1855-1891
Signature
def table_diff(
self,
source: str,
target: str,
on: t.Optional[t.List[str] | exp.Condition] = None,
skip_columns: t.Optional[t.List[str]] = None,
select_models: t.Optional[t.Collection[str]] = None,
where: t.Optional[str | exp.Condition] = None,
limit: int = 20,
show: bool = True,
show_sample: bool = True,
decimals: int = 3,
skip_grain_check: bool = False,
warn_grain_check: bool = False,
temp_schema: t.Optional[str] = None,
schema_diff_ignore_case: bool = False,
**kwargs: t.Any,
) -> t.List[TableDiff]
Import
from sqlmesh.core.context import Context
context = Context(paths="path/to/project")
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| source | str | Yes | Source environment name or table name |
| target | str | Yes | Target environment name or table name |
| on | List[str] or Condition | No | Join condition for row comparison. Table aliases must be "s" (source) and "t" (target). If omitted, uses model's grain |
| skip_columns | List[str] | No | Columns to exclude from comparison |
| select_models | Collection[str] | No | Model selection strings when comparing environments (e.g., "model_a", "tag:critical") |
| where | str or Condition | No | Optional filter condition to apply before comparison |
| limit | int | No | Maximum number of sample differing rows to return. Default: 20 |
| show | bool | No | Display results in console. Default: True |
| show_sample | bool | No | Display sample differing rows in console (requires show=True). Default: True |
| decimals | int | No | Number of decimal places for floating-point comparison. Default: 3 |
| skip_grain_check | bool | No | Skip validation for null or duplicate grain values. Default: False |
| warn_grain_check | bool | No | Warn instead of failing on grain validation issues. Default: False |
| temp_schema | str | No | Schema to use for temporary tables during comparison |
| schema_diff_ignore_case | bool | No | Ignore case when comparing column names. Default: False |
Outputs
| Name | Type | Description |
|---|---|---|
| return | List[TableDiff] | List of TableDiff objects, one per compared table/model. Each contains schema_diff, row_diff, and summary statistics |
Usage Examples
Basic Usage
from sqlmesh.core.context import Context
context = Context()
# Compare a single model between dev and prod
diffs = context.table_diff(
source="prod",
target="dev",
select_models=["customers"]
)
# Results automatically displayed if show=True (default)
# Returns list with one TableDiff object
Compare Multiple Models
from sqlmesh.core.context import Context
context = Context()
# Compare multiple models using selection syntax
diffs = context.table_diff(
source="prod",
target="staging",
select_models=["tag:critical", "orders", "customers+"] # orders and downstream
)
# Returns list of TableDiff objects, one per model
for diff in diffs:
print(f"Model: {diff.model_name}")
print(f" Schema changes: {diff.schema_diff}")
print(f" Row differences: {len(diff.row_diff)}")
Custom Join Condition
from sqlmesh.core.context import Context
context = Context()
# Compare with custom join condition
diffs = context.table_diff(
source="prod",
target="dev",
select_models=["orders"],
on=["order_id", "order_date"] # Composite key
)
# Joins on specified columns instead of inferred grain
Filter and Limit Comparison
from sqlmesh.core.context import Context
context = Context()
# Compare subset of data with filters
diffs = context.table_diff(
source="prod",
target="dev",
select_models=["sales"],
where="sale_date >= '2026-01-01'",
limit=50, # Show up to 50 differing rows
skip_columns=["updated_at", "etl_timestamp"] # Ignore metadata columns
)
Floating Point Precision
from sqlmesh.core.context import Context
context = Context()
# Compare with custom decimal precision
diffs = context.table_diff(
source="prod",
target="dev",
select_models=["financial_metrics"],
decimals=2, # Round to 2 decimal places for comparison
)
# Avoids false positives from floating-point rounding
Silent Comparison
from sqlmesh.core.context import Context
context = Context()
# Programmatic comparison without console output
diffs = context.table_diff(
source="prod",
target="dev",
select_models=["users"],
show=False
)
# Check results programmatically
for diff in diffs:
if diff.schema_diff:
print(f"Schema changed in {diff.model_name}")
if diff.row_diff:
print(f"Found {len(diff.row_diff)} differing rows")
Direct Table Comparison
from sqlmesh.core.context import Context
context = Context()
# Compare tables directly (not through environments)
diffs = context.table_diff(
source="db.schema.old_table",
target="db.schema.new_table",
on=["id"],
limit=100
)
# Useful for ad-hoc table validation
CI/CD Validation
from sqlmesh.core.context import Context
context = Context()
# Automated validation in CI pipeline
diffs = context.table_diff(
source="prod",
target="pr_123",
select_models=["tag:tested"],
show=False
)
# Fail CI if unexpected differences found
has_diffs = any(diff.row_diff or diff.schema_diff for diff in diffs)
if has_diffs:
raise ValueError("Data validation failed - unexpected differences detected")
Regression Testing
from sqlmesh.core.context import Context
context = Context()
# Verify refactoring didn't change results
diffs = context.table_diff(
source="prod",
target="refactor_branch",
select_models=["*"], # All models
decimals=6, # High precision
skip_grain_check=False # Strict validation
)
# Should show no differences for pure refactoring