Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Workflow:TobikoData Sqlmesh Environment management

From Leeroopedia


Knowledge Sources
Domains Data_Engineering, Environment_Management, DevOps
Last Updated 2026-02-07 21:00 GMT

Overview

End-to-end process for creating, managing, comparing, and cleaning up virtual data environments in SQLMesh, enabling isolated development, staging, and production workflows with minimal warehouse resource usage.

Description

This workflow covers the lifecycle management of SQLMesh's virtual data environments. Virtual environments are lightweight branches of the data warehouse that share unchanged physical data between environments while isolating modifications. This allows multiple developers and CI/CD pipelines to work simultaneously without interfering with each other or impacting production. The workflow includes environment creation, data comparison across environments, TTL-based expiration, and cleanup operations.

Key outcomes:

  • Isolated development environments sharing unchanged production data
  • Cross-environment data comparison for validation
  • Automatic environment cleanup via TTL and janitor operations
  • Near-zero storage overhead for unchanged models

Usage

Execute this workflow when managing multiple development branches, staging environments, or CI/CD environments for a SQLMesh project. This is essential for teams with multiple developers or automated pipelines that need concurrent, isolated access to the data warehouse.

Execution Steps

Step 1: Environment Creation

Create a new virtual environment by running a plan against a named environment. SQLMesh creates the environment namespace in the warehouse and establishes view references to existing physical data for unchanged models. Only models that differ from the base environment are physically materialized.

Key considerations:

  • Environment names are sanitized to valid SQL identifiers
  • Environments can be created from any existing environment as a base
  • The virtual layer consists of views pointing to physical snapshot tables
  • Creation is nearly instantaneous for environments with no model changes

Step 2: Environment Listing and Inspection

List all active environments with their creation dates, expiration times, and associated model versions. Inspect individual environments to see which models are materialized, which are shared with production, and the current state of the environment's virtual layer.

Key considerations:

  • Each environment tracks its set of snapshot references
  • Environments show their TTL and expiration status
  • The finalized timestamp indicates when the environment was last updated
  • Catalog name mapping can vary between environments

Step 3: Cross Environment Data Comparison

Compare data between environments using the table diff capability. This performs a row-level comparison of model outputs between a source and target environment, identifying schema differences, added rows, removed rows, and value changes. This is essential for validating that development changes produce expected results.

Key considerations:

  • Comparison requires specifying the join condition or grain
  • Schema differences (added/removed/modified columns) are reported separately
  • Row-level differences show actual data values for debugging
  • Statistical summaries quantify the scope of differences

Step 4: Environment Promotion

Promote changes from a development environment to production by running a plan that targets the production environment. The plan compares the dev environment's model versions against production and generates the necessary backfill operations. This is the standard path for deploying validated changes.

Key considerations:

  • Promotion plans may require additional backfill for full date ranges
  • The production virtual layer is updated atomically
  • Promotion does not affect other development environments
  • Snapshots shared between environments are not re-materialized

Step 5: Environment Invalidation and Cleanup

Mark environments for deletion using the invalidate command. The janitor process handles the actual cleanup by removing expired views, schemas, and orphaned snapshot tables. TTL-based expiration automatically invalidates environments that exceed their configured lifetime.

Key considerations:

  • Invalidation marks the environment for cleanup but does not immediately delete
  • The janitor runs periodically to clean up invalidated environments
  • Expired physical tables are only deleted if no environment references them
  • The prod environment cannot be invalidated

Execution Diagram

GitHub URL

Workflow Repository