Implementation:MaterializeInc Materialize Mzexplore Module

Overview

The mzexplore module is a Python toolkit for exploring, extracting, comparing, and cloning DDL from Materialize catalogs. It enables developers to extract CREATE statements and EXPLAIN plans from running Materialize instances, compare optimizer outputs across versions, and generate clone scripts that replicate catalog objects into different clusters.

The module is located in misc/python/materialize/mzexplore/ with a CLI entry point at misc/python/materialize/cli/mzexplore.py.

Architecture

Module	Purpose
`common.py`	Shared types, enums, and dataclasses for explain plans and catalog items
`extract.py`	Extraction of CREATE statements and EXPLAIN plans from catalogs
`analyze.py`	Comparison and diffing of extracted explain plans
`clone.py`	DDL cloning of catalog objects across clusters
`sql.py`	Database connection wrapper and SQL query helpers
`cli/mzexplore.py`	Click-based CLI entry point

Common Types

The common.py module defines the core type system used throughout mzexplore.

ExplaineeType

A bitmask enum that classifies what kind of object is being explained:

Value	Numeric	Description
`CREATE_STATEMENT`	1	Explain a CREATE statement
`CATALOG_ITEM`	2	Explain an existing catalog item
`REPLAN_ITEM`	4	Re-plan an existing catalog item
`ALL`	7	All of the above

The contains method uses bitwise AND to check membership:

def contains(self, other: "ExplaineeType") -> bool:
    return (self.value & other.value) > 0

ExplainFormat

An enum for EXPLAIN output formats:

Value	File Extension
`TEXT`	`.txt`
`JSON`	`.json`

ExplainStage

A string enum representing optimizer pipeline stages:

Value	SQL Clause
`RAW_PLAN`	`RAW PLAN`
`DECORRELATED_PLAN`	`DECORRELATED PLAN`
`LOCAL_PLAN`	`LOCALLY OPTIMIZED PLAN`
`OPTIMIZED_PLAN`	`OPTIMIZED PLAN`
`PHYSICAL_PLAN`	`PHYSICAL PLAN`
`OPTIMIZER_TRACE`	`OPTIMIZER TRACE`

ExplainOption

A frozen dataclass representing an EXPLAIN feature flag (e.g., enable_foo = true). The affects_pipeline method returns True if the option starts with enable_ or is reoptimize_imported_views, indicating it may alter optimizer output.

@dataclass(frozen=True)
class ExplainOption:
    key: str
    val: str | bool | int | None = None

    def affects_pipeline(self) -> bool:
        return any((
            self.key.lower() == "reoptimize_imported_views",
            self.key.lower().startswith("enable_"),
        ))

ExplainOptionType

A custom Click parameter type that parses key=value strings into ExplainOption instances using a regex pattern.

Extraction

The extract.py module provides the defs function for extracting CREATE statements from a Materialize catalog.

defs Function

def defs(
    target: Path,
    database: str,
    schema: str,
    name: str,
    db_port: int,
    db_host: str,
    db_user: str,
    db_pass: str | None,
    db_require_ssl: bool,
    mzfmt: bool,
) -> None:

This function:

Ensures the target directory exists.
Connects to the Materialize database using the sql.Database wrapper.
Queries catalog items matching the specified database, schema, and name ILIKE patterns.
For each item, writes a file containing the item's id, oid, and create_sql.

Output files are organized using CreateFile dataclasses that encode the database, schema, name, and item type into the file path.

Analysis

The analyze.py module provides the changes function for comparing optimized plans across different extraction runs.

changes Function

def changes(
    out: TextIO,
    target: Path,
    header_name: str,
    base_suffix: str,
    diff_suffix: str,
) -> None:

This function:

Scans the target directory for files matching **/*.optimized_plan.{base_suffix}.txt.
For each base plan file, locates the corresponding diff file with the diff_suffix.
Compares the two plan texts and, if they differ, appends a section to the output with a code --diff command for side-by-side comparison.

The output is formatted as a markdown checklist with TODO(REGRESSION|IMPROVEMENT) labels to guide manual review.

Cloning

The clone.py module provides the defs function for generating DDL scripts that replicate catalog objects into a different cluster.

defs Function

def defs(
    ddl_out: TextIO,
    cmp_out: TextIO,
    database: str,
    schema: str,
    cluster: str,
    object_ids: list[str],
    db_port: int,
    db_host: str,
    db_user: str,
    db_pass: str | None,
    db_require_ssl: bool,
    mzfmt: bool,
) -> None:

This function:

Validates that at least one object_id is provided.
Determines the source cluster from the provided object IDs (must be exactly one unique cluster).
Verifies the source and destination clusters are different.
Traverses clone dependencies using db.clone_dependencies().
Generates DDL that rewrites cluster references (e.g., IN CLUSTER old_cluster to IN CLUSTER new_cluster).
Categorizes references into aliased refs, index-on refs, and simple refs for correct dependency ordering.

SQL Database Wrapper

The sql.py module provides the Database class, a lightweight wrapper around pg8000 for connecting to Materialize.

Database Class

Method	Description
`__init__`	Establishes a connection with optional SSL, sets autocommit
`close`	Closes the database connection
`query_one`	Executes a query and returns a single row as a dictionary
`query_all`	Executes a query and yields all rows as dictionaries via a generator
`execute`	Executes a statement without returning results
`catalog_items`	Queries user or system catalog items with ILIKE pattern filtering
`object_clusters`	Retrieves cluster information for given object IDs

class Database:
    def __init__(self, port, host, user, password, database, require_ssl):
        self.conn = pg8000.connect(
            host=host, port=port, user=user,
            password=password, database=database,
            ssl_context=ssl_context,
        )
        self.conn.autocommit = True

The catalog_items method loads SQL from resource files (catalog/u_items.sql for user items, catalog/s_items.sql for system items) and applies ILIKE filters for database, schema, and name.

CLI Entry Point

The cli/mzexplore.py module provides a Click-based command-line interface that exposes mzexplore functionality.

Structure

The CLI uses a @click.group() with shared argument and option classes:

Class	Parameters	Purpose
`Arg`	`repository`, `output_file`, `base_suffix`, `diff_suffix`	Positional argument definitions
`Opt`	`db_port`, `db_host`, `db_user`, etc.	Database connection options with env var support

Default database connection settings:

Option	Default	Env Var
`db_port`	`6877`	`PGPORT`
`db_host`	`localhost`	`PGHOST`
`db_user`	`mz_support`	(n/a)

Key Source Files

File	Path
Common types	`misc/python/materialize/mzexplore/common.py`
Extraction	`misc/python/materialize/mzexplore/extract.py`
Analysis	`misc/python/materialize/mzexplore/analyze.py`
Cloning	`misc/python/materialize/mzexplore/clone.py`
SQL wrapper	`misc/python/materialize/mzexplore/sql.py`
CLI entry point	`misc/python/materialize/cli/mzexplore.py`

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment