Implementation:TobikoData Sqlmesh Context Table Diff

Knowledge Sources	SQLMesh SQLMesh Docs
Domains	Data_Engineering, Environment_Management
Last Updated	2026-02-07 00:00 GMT

Overview

Concrete tool for comparing data between environments or tables provided by SQLMesh.

Description

The Context.table_diff() method performs comprehensive comparison between two data sources, which can be either environment names or direct table references. When environments are provided, it compares the corresponding models across those environments. The comparison operates at multiple levels: schema differences (added/removed/changed columns), aggregate statistics (row counts, null counts, distinct values), and row-level differences (specific rows that differ based on grain/primary key).

The method supports flexible configuration including join conditions, column filtering, row limits, floating-point precision control, and grain validation. It can operate across models when select_models is provided, comparing multiple tables in a single operation. Results are returned as TableDiff objects and optionally displayed in the console with formatted output showing schema changes and sample differing rows.

Usage

Use Context.table_diff() when you need to:

Validate that development changes produce expected results before merging to production
Compare staging environment to production for deployment verification
Regression test after refactoring to ensure output hasn't changed
Debug data discrepancies by identifying specific differing rows
Build automated data quality checks in CI/CD pipelines
Verify forward-only model changes produce identical results in preview mode
Document expected data changes as part of code review
Troubleshoot issues by comparing known-good and problematic environments

Code Reference

Source Location

Repository: sqlmesh
File: sqlmesh/core/context.py:L1855-1891

Signature

def table_diff(
    self,
    source: str,
    target: str,
    on: t.Optional[t.List[str] | exp.Condition] = None,
    skip_columns: t.Optional[t.List[str]] = None,
    select_models: t.Optional[t.Collection[str]] = None,
    where: t.Optional[str | exp.Condition] = None,
    limit: int = 20,
    show: bool = True,
    show_sample: bool = True,
    decimals: int = 3,
    skip_grain_check: bool = False,
    warn_grain_check: bool = False,
    temp_schema: t.Optional[str] = None,
    schema_diff_ignore_case: bool = False,
    **kwargs: t.Any,
) -> t.List[TableDiff]

Import

from sqlmesh.core.context import Context

context = Context(paths="path/to/project")

I/O Contract

Inputs

Name	Type	Required	Description
source	str	Yes	Source environment name or table name
target	str	Yes	Target environment name or table name
on	List[str] or Condition	No	Join condition for row comparison. Table aliases must be "s" (source) and "t" (target). If omitted, uses model's grain
skip_columns	List[str]	No	Columns to exclude from comparison
select_models	Collection[str]	No	Model selection strings when comparing environments (e.g., "model_a", "tag:critical")
where	str or Condition	No	Optional filter condition to apply before comparison
limit	int	No	Maximum number of sample differing rows to return. Default: 20
show	bool	No	Display results in console. Default: True
show_sample	bool	No	Display sample differing rows in console (requires show=True). Default: True
decimals	int	No	Number of decimal places for floating-point comparison. Default: 3
skip_grain_check	bool	No	Skip validation for null or duplicate grain values. Default: False
warn_grain_check	bool	No	Warn instead of failing on grain validation issues. Default: False
temp_schema	str	No	Schema to use for temporary tables during comparison
schema_diff_ignore_case	bool	No	Ignore case when comparing column names. Default: False

Outputs

Name	Type	Description
return	List[TableDiff]	List of TableDiff objects, one per compared table/model. Each contains schema_diff, row_diff, and summary statistics

Usage Examples

Basic Usage

from sqlmesh.core.context import Context

context = Context()

# Compare a single model between dev and prod
diffs = context.table_diff(
    source="prod",
    target="dev",
    select_models=["customers"]
)

# Results automatically displayed if show=True (default)
# Returns list with one TableDiff object

Compare Multiple Models

from sqlmesh.core.context import Context

context = Context()

# Compare multiple models using selection syntax
diffs = context.table_diff(
    source="prod",
    target="staging",
    select_models=["tag:critical", "orders", "customers+"]  # orders and downstream
)

# Returns list of TableDiff objects, one per model
for diff in diffs:
    print(f"Model: {diff.model_name}")
    print(f"  Schema changes: {diff.schema_diff}")
    print(f"  Row differences: {len(diff.row_diff)}")

Custom Join Condition

from sqlmesh.core.context import Context

context = Context()

# Compare with custom join condition
diffs = context.table_diff(
    source="prod",
    target="dev",
    select_models=["orders"],
    on=["order_id", "order_date"]  # Composite key
)

# Joins on specified columns instead of inferred grain

Filter and Limit Comparison

from sqlmesh.core.context import Context

context = Context()

# Compare subset of data with filters
diffs = context.table_diff(
    source="prod",
    target="dev",
    select_models=["sales"],
    where="sale_date >= '2026-01-01'",
    limit=50,  # Show up to 50 differing rows
    skip_columns=["updated_at", "etl_timestamp"]  # Ignore metadata columns
)

Floating Point Precision

from sqlmesh.core.context import Context

context = Context()

# Compare with custom decimal precision
diffs = context.table_diff(
    source="prod",
    target="dev",
    select_models=["financial_metrics"],
    decimals=2,  # Round to 2 decimal places for comparison
)

# Avoids false positives from floating-point rounding

Silent Comparison

from sqlmesh.core.context import Context

context = Context()

# Programmatic comparison without console output
diffs = context.table_diff(
    source="prod",
    target="dev",
    select_models=["users"],
    show=False
)

# Check results programmatically
for diff in diffs:
    if diff.schema_diff:
        print(f"Schema changed in {diff.model_name}")

    if diff.row_diff:
        print(f"Found {len(diff.row_diff)} differing rows")

Direct Table Comparison

from sqlmesh.core.context import Context

context = Context()

# Compare tables directly (not through environments)
diffs = context.table_diff(
    source="db.schema.old_table",
    target="db.schema.new_table",
    on=["id"],
    limit=100
)

# Useful for ad-hoc table validation

CI/CD Validation

from sqlmesh.core.context import Context

context = Context()

# Automated validation in CI pipeline
diffs = context.table_diff(
    source="prod",
    target="pr_123",
    select_models=["tag:tested"],
    show=False
)

# Fail CI if unexpected differences found
has_diffs = any(diff.row_diff or diff.schema_diff for diff in diffs)

if has_diffs:
    raise ValueError("Data validation failed - unexpected differences detected")

Regression Testing

from sqlmesh.core.context import Context

context = Context()

# Verify refactoring didn't change results
diffs = context.table_diff(
    source="prod",
    target="refactor_branch",
    select_models=["*"],  # All models
    decimals=6,  # High precision
    skip_grain_check=False  # Strict validation
)

# Should show no differences for pure refactoring

Related Pages

Implements Principle

Principle:TobikoData_Sqlmesh_Cross_Environment_Data_Comparison

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment