Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:TobikoData Sqlmesh Context Table Diff

From Leeroopedia


Knowledge Sources
Domains Data_Engineering, Environment_Management
Last Updated 2026-02-07 00:00 GMT

Overview

Concrete tool for comparing data between environments or tables provided by SQLMesh.

Description

The Context.table_diff() method performs comprehensive comparison between two data sources, which can be either environment names or direct table references. When environments are provided, it compares the corresponding models across those environments. The comparison operates at multiple levels: schema differences (added/removed/changed columns), aggregate statistics (row counts, null counts, distinct values), and row-level differences (specific rows that differ based on grain/primary key).

The method supports flexible configuration including join conditions, column filtering, row limits, floating-point precision control, and grain validation. It can operate across models when select_models is provided, comparing multiple tables in a single operation. Results are returned as TableDiff objects and optionally displayed in the console with formatted output showing schema changes and sample differing rows.

Usage

Use Context.table_diff() when you need to:

  • Validate that development changes produce expected results before merging to production
  • Compare staging environment to production for deployment verification
  • Regression test after refactoring to ensure output hasn't changed
  • Debug data discrepancies by identifying specific differing rows
  • Build automated data quality checks in CI/CD pipelines
  • Verify forward-only model changes produce identical results in preview mode
  • Document expected data changes as part of code review
  • Troubleshoot issues by comparing known-good and problematic environments

Code Reference

Source Location

  • Repository: sqlmesh
  • File: sqlmesh/core/context.py:L1855-1891

Signature

def table_diff(
    self,
    source: str,
    target: str,
    on: t.Optional[t.List[str] | exp.Condition] = None,
    skip_columns: t.Optional[t.List[str]] = None,
    select_models: t.Optional[t.Collection[str]] = None,
    where: t.Optional[str | exp.Condition] = None,
    limit: int = 20,
    show: bool = True,
    show_sample: bool = True,
    decimals: int = 3,
    skip_grain_check: bool = False,
    warn_grain_check: bool = False,
    temp_schema: t.Optional[str] = None,
    schema_diff_ignore_case: bool = False,
    **kwargs: t.Any,
) -> t.List[TableDiff]

Import

from sqlmesh.core.context import Context

context = Context(paths="path/to/project")

I/O Contract

Inputs

Name Type Required Description
source str Yes Source environment name or table name
target str Yes Target environment name or table name
on List[str] or Condition No Join condition for row comparison. Table aliases must be "s" (source) and "t" (target). If omitted, uses model's grain
skip_columns List[str] No Columns to exclude from comparison
select_models Collection[str] No Model selection strings when comparing environments (e.g., "model_a", "tag:critical")
where str or Condition No Optional filter condition to apply before comparison
limit int No Maximum number of sample differing rows to return. Default: 20
show bool No Display results in console. Default: True
show_sample bool No Display sample differing rows in console (requires show=True). Default: True
decimals int No Number of decimal places for floating-point comparison. Default: 3
skip_grain_check bool No Skip validation for null or duplicate grain values. Default: False
warn_grain_check bool No Warn instead of failing on grain validation issues. Default: False
temp_schema str No Schema to use for temporary tables during comparison
schema_diff_ignore_case bool No Ignore case when comparing column names. Default: False

Outputs

Name Type Description
return List[TableDiff] List of TableDiff objects, one per compared table/model. Each contains schema_diff, row_diff, and summary statistics

Usage Examples

Basic Usage

from sqlmesh.core.context import Context

context = Context()

# Compare a single model between dev and prod
diffs = context.table_diff(
    source="prod",
    target="dev",
    select_models=["customers"]
)

# Results automatically displayed if show=True (default)
# Returns list with one TableDiff object

Compare Multiple Models

from sqlmesh.core.context import Context

context = Context()

# Compare multiple models using selection syntax
diffs = context.table_diff(
    source="prod",
    target="staging",
    select_models=["tag:critical", "orders", "customers+"]  # orders and downstream
)

# Returns list of TableDiff objects, one per model
for diff in diffs:
    print(f"Model: {diff.model_name}")
    print(f"  Schema changes: {diff.schema_diff}")
    print(f"  Row differences: {len(diff.row_diff)}")

Custom Join Condition

from sqlmesh.core.context import Context

context = Context()

# Compare with custom join condition
diffs = context.table_diff(
    source="prod",
    target="dev",
    select_models=["orders"],
    on=["order_id", "order_date"]  # Composite key
)

# Joins on specified columns instead of inferred grain

Filter and Limit Comparison

from sqlmesh.core.context import Context

context = Context()

# Compare subset of data with filters
diffs = context.table_diff(
    source="prod",
    target="dev",
    select_models=["sales"],
    where="sale_date >= '2026-01-01'",
    limit=50,  # Show up to 50 differing rows
    skip_columns=["updated_at", "etl_timestamp"]  # Ignore metadata columns
)

Floating Point Precision

from sqlmesh.core.context import Context

context = Context()

# Compare with custom decimal precision
diffs = context.table_diff(
    source="prod",
    target="dev",
    select_models=["financial_metrics"],
    decimals=2,  # Round to 2 decimal places for comparison
)

# Avoids false positives from floating-point rounding

Silent Comparison

from sqlmesh.core.context import Context

context = Context()

# Programmatic comparison without console output
diffs = context.table_diff(
    source="prod",
    target="dev",
    select_models=["users"],
    show=False
)

# Check results programmatically
for diff in diffs:
    if diff.schema_diff:
        print(f"Schema changed in {diff.model_name}")

    if diff.row_diff:
        print(f"Found {len(diff.row_diff)} differing rows")

Direct Table Comparison

from sqlmesh.core.context import Context

context = Context()

# Compare tables directly (not through environments)
diffs = context.table_diff(
    source="db.schema.old_table",
    target="db.schema.new_table",
    on=["id"],
    limit=100
)

# Useful for ad-hoc table validation

CI/CD Validation

from sqlmesh.core.context import Context

context = Context()

# Automated validation in CI pipeline
diffs = context.table_diff(
    source="prod",
    target="pr_123",
    select_models=["tag:tested"],
    show=False
)

# Fail CI if unexpected differences found
has_diffs = any(diff.row_diff or diff.schema_diff for diff in diffs)

if has_diffs:
    raise ValueError("Data validation failed - unexpected differences detected")

Regression Testing

from sqlmesh.core.context import Context

context = Context()

# Verify refactoring didn't change results
diffs = context.table_diff(
    source="prod",
    target="refactor_branch",
    select_models=["*"],  # All models
    decimals=6,  # High precision
    skip_grain_check=False  # Strict validation
)

# Should show no differences for pure refactoring

Related Pages

Implements Principle

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment