Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Principle:TobikoData Sqlmesh Environment Invalidation And Cleanup

From Leeroopedia


Knowledge Sources
Domains Data_Engineering, Environment_Management
Last Updated 2026-02-07 00:00 GMT

Overview

Marking environments for deletion and reclaiming warehouse resources by removing virtual layer views and metadata.

Description

Environment invalidation and cleanup is a lifecycle management mechanism that removes environments when they are no longer needed. This operates in two stages: invalidation (marking an environment for deletion by setting its expiration timestamp), and cleanup (physical removal of database objects and metadata). The separation allows for graceful degradation where environments can be marked for deletion but remain queryable until actual cleanup occurs.

The cleanup process is selective based on the environment's suffix target configuration:

  • Schema-level environments: Drop entire schemas containing all environment views
  • Catalog-level environments: Drop entire catalogs (on engines that support it)
  • Table-level environments: Drop individual view objects

This solves important operational problems:

  • Resource Management: Preventing accumulation of unused environments that consume warehouse resources
  • Cost Control: Removing virtual views and metadata reduces storage and compute costs
  • Namespace Pollution: Keeping environment listings clean and manageable
  • Security: Ensuring stale environments with potentially sensitive data are properly removed
  • Lifecycle Automation: Enabling automatic cleanup of temporary environments (PR branches, test runs)

The cleanup mechanism is also resilient, providing options to warn rather than fail on deletion errors, allowing cleanup to proceed even if some resources are already gone or inaccessible.

Usage

Use environment invalidation and cleanup when:

  • Deleting development environments that are no longer needed
  • Cleaning up PR environments after branches are merged or closed
  • Removing expired temporary environments created for testing
  • Implementing TTL-based automatic environment expiration
  • Recovering from failed environment creation that left partial state
  • Decommissioning old staging environments after promoting to production
  • Implementing environment lifecycle policies (e.g., auto-delete after 7 days)
  • Freeing up database resources when approaching storage limits
  • Removing environments as part of user offboarding or project sunset
  • Cleaning up after CI/CD test runs that create temporary environments

Theoretical Basis

The invalidation and cleanup mechanism follows this conceptual algorithm:

Environment Invalidation:

FUNCTION invalidate_environment(environment_name):
    BEGIN TRANSACTION

    # Mark environment for deletion
    current_time = CURRENT_TIMESTAMP

    UPDATE state_database._environments
        SET expiration_ts = current_time
        WHERE name = environment_name

    COMMIT TRANSACTION

    # Environment still exists and is queryable but marked for cleanup
    RETURN {
        invalidated: True,
        expiration_ts: current_time,
        cleanup_pending: True
    }

Expired Environment Discovery:

FUNCTION find_expired_environments(current_time):
    expired = QUERY state_database._environments:
        SELECT * FROM _environments
        WHERE expiration_ts IS NOT NULL
          AND expiration_ts <= current_time

    RETURN expired

Schema and Catalog Organization:

FUNCTION organize_cleanup_targets(expired_environments):
    catalogs_to_drop = SET()
    schemas_to_drop = SET()
    views_to_drop = SET()

    FOR each environment IN expired_environments:
        FOR each snapshot IN environment.snapshots:
            IF snapshot is not a model OR snapshot is symbolic:
                CONTINUE

            # Determine database adapter based on gateway configuration
            adapter = GET adapter for snapshot.model_gateway IF environment.gateway_managed
                     ELSE default_adapter

            # Compute qualified names for this environment
            view_catalog = snapshot.qualified_view_name.catalog_for_environment(
                environment.naming_info,
                adapter.dialect
            )
            view_schema = snapshot.qualified_view_name.schema_for_environment(
                environment.naming_info,
                adapter.dialect
            )
            view_name = snapshot.qualified_view_name.for_environment(
                environment.naming_info,
                adapter.dialect
            )

            # Organize by suffix target strategy
            IF environment.suffix_target IS CATALOG:
                IF view_catalog:
                    ADD (adapter, view_catalog) to catalogs_to_drop
            ELSE IF environment.suffix_target IS SCHEMA:
                schema_ref = SCHEMA(view_schema, view_catalog)
                ADD (adapter, schema_ref) to schemas_to_drop
            ELSE IF environment.suffix_target IS TABLE:
                ADD (adapter, view_name) to views_to_drop

    RETURN {
        catalogs: catalogs_to_drop,
        schemas: schemas_to_drop,
        views: views_to_drop
    }

Physical Resource Cleanup:

FUNCTION cleanup_database_objects(cleanup_targets, warn_on_failure):
    errors = []

    # Drop individual views
    FOR each (adapter, view_name) IN cleanup_targets.views:
        TRY:
            adapter.drop_view(view_name, ignore_if_not_exists=True)
        CATCH error:
            IF warn_on_failure:
                LOG warning("Failed to drop view {view_name}: {error}")
                ADD error to errors
            ELSE:
                RAISE error

    # Drop schemas with cascade
    FOR each (adapter, schema) IN cleanup_targets.schemas:
        TRY:
            adapter.drop_schema(
                schema,
                ignore_if_not_exists=True,
                cascade=True  # Drops all contained objects
            )
        CATCH error:
            IF warn_on_failure:
                LOG warning("Failed to drop schema {schema}: {error}")
                ADD error to errors
            ELSE:
                RAISE error

    # Drop catalogs (only on engines that support it)
    FOR each (adapter, catalog) IN cleanup_targets.catalogs:
        IF adapter.SUPPORTS_CREATE_DROP_CATALOG:
            TRY:
                adapter.drop_catalog(catalog)
            CATCH error:
                IF warn_on_failure:
                    LOG warning("Failed to drop catalog {catalog}: {error}")
                    ADD error to errors
                ELSE:
                    RAISE error

    RETURN errors

Metadata Cleanup:

FUNCTION cleanup_environment_metadata(expired_environments):
    BEGIN TRANSACTION

    environment_names = [env.name FOR env IN expired_environments]

    # Remove environment records
    DELETE FROM state_database._environments
        WHERE name IN environment_names

    # Note: Snapshots are NOT deleted - they may be referenced by other environments
    # Only the environment's reference to snapshots is removed

    COMMIT TRANSACTION

    RETURN deleted_count

Complete Cleanup Flow:

FUNCTION cleanup_expired_environments(warn_on_failure):
    current_time = CURRENT_TIMESTAMP

    # Find environments to clean up
    expired = find_expired_environments(current_time)

    IF expired is empty:
        RETURN []

    # Organize database objects by cleanup strategy
    cleanup_targets = organize_cleanup_targets(expired)

    # Remove physical database objects
    errors = cleanup_database_objects(cleanup_targets, warn_on_failure)

    # Remove metadata only if physical cleanup succeeded or warn_on_failure is True
    IF errors is empty OR warn_on_failure:
        cleanup_environment_metadata(expired)

    RETURN expired

Synchronous vs Asynchronous Cleanup:

FUNCTION handle_cleanup(environment_name, sync):
    # Invalidate immediately
    invalidate_environment(environment_name)

    IF sync:
        # Blocking cleanup - wait for completion
        cleanup_expired_environments(warn_on_failure=False)
        RETURN "Environment deleted"
    ELSE:
        # Async cleanup - janitor process will handle it
        RETURN "Environment invalidated, cleanup pending"

The design separates invalidation from cleanup to support both immediate synchronous deletion (for interactive use) and deferred asynchronous cleanup (for automated janitor processes), providing flexibility in operational workflows.

Related Pages

Implemented By

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment