Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:MaterializeInc Materialize Mzexplore Module

From Leeroopedia


Overview

The mzexplore module is a Python toolkit for exploring, extracting, comparing, and cloning DDL from Materialize catalogs. It enables developers to extract CREATE statements and EXPLAIN plans from running Materialize instances, compare optimizer outputs across versions, and generate clone scripts that replicate catalog objects into different clusters.

The module is located in misc/python/materialize/mzexplore/ with a CLI entry point at misc/python/materialize/cli/mzexplore.py.

Architecture

Module Purpose
common.py Shared types, enums, and dataclasses for explain plans and catalog items
extract.py Extraction of CREATE statements and EXPLAIN plans from catalogs
analyze.py Comparison and diffing of extracted explain plans
clone.py DDL cloning of catalog objects across clusters
sql.py Database connection wrapper and SQL query helpers
cli/mzexplore.py Click-based CLI entry point

Common Types

The common.py module defines the core type system used throughout mzexplore.

ExplaineeType

A bitmask enum that classifies what kind of object is being explained:

Value Numeric Description
CREATE_STATEMENT 1 Explain a CREATE statement
CATALOG_ITEM 2 Explain an existing catalog item
REPLAN_ITEM 4 Re-plan an existing catalog item
ALL 7 All of the above

The contains method uses bitwise AND to check membership:

def contains(self, other: "ExplaineeType") -> bool:
    return (self.value & other.value) > 0

ExplainFormat

An enum for EXPLAIN output formats:

Value File Extension
TEXT .txt
JSON .json

ExplainStage

A string enum representing optimizer pipeline stages:

Value SQL Clause
RAW_PLAN RAW PLAN
DECORRELATED_PLAN DECORRELATED PLAN
LOCAL_PLAN LOCALLY OPTIMIZED PLAN
OPTIMIZED_PLAN OPTIMIZED PLAN
PHYSICAL_PLAN PHYSICAL PLAN
OPTIMIZER_TRACE OPTIMIZER TRACE

ExplainOption

A frozen dataclass representing an EXPLAIN feature flag (e.g., enable_foo = true). The affects_pipeline method returns True if the option starts with enable_ or is reoptimize_imported_views, indicating it may alter optimizer output.

@dataclass(frozen=True)
class ExplainOption:
    key: str
    val: str | bool | int | None = None

    def affects_pipeline(self) -> bool:
        return any((
            self.key.lower() == "reoptimize_imported_views",
            self.key.lower().startswith("enable_"),
        ))

ExplainOptionType

A custom Click parameter type that parses key=value strings into ExplainOption instances using a regex pattern.

Extraction

The extract.py module provides the defs function for extracting CREATE statements from a Materialize catalog.

defs Function

def defs(
    target: Path,
    database: str,
    schema: str,
    name: str,
    db_port: int,
    db_host: str,
    db_user: str,
    db_pass: str | None,
    db_require_ssl: bool,
    mzfmt: bool,
) -> None:

This function:

  1. Ensures the target directory exists.
  2. Connects to the Materialize database using the sql.Database wrapper.
  3. Queries catalog items matching the specified database, schema, and name ILIKE patterns.
  4. For each item, writes a file containing the item's id, oid, and create_sql.

Output files are organized using CreateFile dataclasses that encode the database, schema, name, and item type into the file path.

Analysis

The analyze.py module provides the changes function for comparing optimized plans across different extraction runs.

changes Function

def changes(
    out: TextIO,
    target: Path,
    header_name: str,
    base_suffix: str,
    diff_suffix: str,
) -> None:

This function:

  1. Scans the target directory for files matching **/*.optimized_plan.{base_suffix}.txt.
  2. For each base plan file, locates the corresponding diff file with the diff_suffix.
  3. Compares the two plan texts and, if they differ, appends a section to the output with a code --diff command for side-by-side comparison.

The output is formatted as a markdown checklist with TODO(REGRESSION|IMPROVEMENT) labels to guide manual review.

Cloning

The clone.py module provides the defs function for generating DDL scripts that replicate catalog objects into a different cluster.

defs Function

def defs(
    ddl_out: TextIO,
    cmp_out: TextIO,
    database: str,
    schema: str,
    cluster: str,
    object_ids: list[str],
    db_port: int,
    db_host: str,
    db_user: str,
    db_pass: str | None,
    db_require_ssl: bool,
    mzfmt: bool,
) -> None:

This function:

  1. Validates that at least one object_id is provided.
  2. Determines the source cluster from the provided object IDs (must be exactly one unique cluster).
  3. Verifies the source and destination clusters are different.
  4. Traverses clone dependencies using db.clone_dependencies().
  5. Generates DDL that rewrites cluster references (e.g., IN CLUSTER old_cluster to IN CLUSTER new_cluster).
  6. Categorizes references into aliased refs, index-on refs, and simple refs for correct dependency ordering.

SQL Database Wrapper

The sql.py module provides the Database class, a lightweight wrapper around pg8000 for connecting to Materialize.

Database Class

Method Description
__init__ Establishes a connection with optional SSL, sets autocommit
close Closes the database connection
query_one Executes a query and returns a single row as a dictionary
query_all Executes a query and yields all rows as dictionaries via a generator
execute Executes a statement without returning results
catalog_items Queries user or system catalog items with ILIKE pattern filtering
object_clusters Retrieves cluster information for given object IDs
class Database:
    def __init__(self, port, host, user, password, database, require_ssl):
        self.conn = pg8000.connect(
            host=host, port=port, user=user,
            password=password, database=database,
            ssl_context=ssl_context,
        )
        self.conn.autocommit = True

The catalog_items method loads SQL from resource files (catalog/u_items.sql for user items, catalog/s_items.sql for system items) and applies ILIKE filters for database, schema, and name.

CLI Entry Point

The cli/mzexplore.py module provides a Click-based command-line interface that exposes mzexplore functionality.

Structure

The CLI uses a @click.group() with shared argument and option classes:

Class Parameters Purpose
Arg repository, output_file, base_suffix, diff_suffix Positional argument definitions
Opt db_port, db_host, db_user, etc. Database connection options with env var support

Default database connection settings:

Option Default Env Var
db_port 6877 PGPORT
db_host localhost PGHOST
db_user mz_support (n/a)

Key Source Files

File Path
Common types misc/python/materialize/mzexplore/common.py
Extraction misc/python/materialize/mzexplore/extract.py
Analysis misc/python/materialize/mzexplore/analyze.py
Cloning misc/python/materialize/mzexplore/clone.py
SQL wrapper misc/python/materialize/mzexplore/sql.py
CLI entry point misc/python/materialize/cli/mzexplore.py

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment