Implementation:MaterializeInc Materialize Mzexplore Module
Overview
The mzexplore module is a Python toolkit for exploring, extracting, comparing, and cloning DDL from Materialize catalogs. It enables developers to extract CREATE statements and EXPLAIN plans from running Materialize instances, compare optimizer outputs across versions, and generate clone scripts that replicate catalog objects into different clusters.
The module is located in misc/python/materialize/mzexplore/ with a CLI entry point at misc/python/materialize/cli/mzexplore.py.
Architecture
| Module | Purpose |
|---|---|
common.py |
Shared types, enums, and dataclasses for explain plans and catalog items |
extract.py |
Extraction of CREATE statements and EXPLAIN plans from catalogs |
analyze.py |
Comparison and diffing of extracted explain plans |
clone.py |
DDL cloning of catalog objects across clusters |
sql.py |
Database connection wrapper and SQL query helpers |
cli/mzexplore.py |
Click-based CLI entry point |
Common Types
The common.py module defines the core type system used throughout mzexplore.
ExplaineeType
A bitmask enum that classifies what kind of object is being explained:
| Value | Numeric | Description |
|---|---|---|
CREATE_STATEMENT |
1 | Explain a CREATE statement |
CATALOG_ITEM |
2 | Explain an existing catalog item |
REPLAN_ITEM |
4 | Re-plan an existing catalog item |
ALL |
7 | All of the above |
The contains method uses bitwise AND to check membership:
def contains(self, other: "ExplaineeType") -> bool:
return (self.value & other.value) > 0
ExplainFormat
An enum for EXPLAIN output formats:
| Value | File Extension |
|---|---|
TEXT |
.txt
|
JSON |
.json
|
ExplainStage
A string enum representing optimizer pipeline stages:
| Value | SQL Clause |
|---|---|
RAW_PLAN |
RAW PLAN
|
DECORRELATED_PLAN |
DECORRELATED PLAN
|
LOCAL_PLAN |
LOCALLY OPTIMIZED PLAN
|
OPTIMIZED_PLAN |
OPTIMIZED PLAN
|
PHYSICAL_PLAN |
PHYSICAL PLAN
|
OPTIMIZER_TRACE |
OPTIMIZER TRACE
|
ExplainOption
A frozen dataclass representing an EXPLAIN feature flag (e.g., enable_foo = true). The affects_pipeline method returns True if the option starts with enable_ or is reoptimize_imported_views, indicating it may alter optimizer output.
@dataclass(frozen=True)
class ExplainOption:
key: str
val: str | bool | int | None = None
def affects_pipeline(self) -> bool:
return any((
self.key.lower() == "reoptimize_imported_views",
self.key.lower().startswith("enable_"),
))
ExplainOptionType
A custom Click parameter type that parses key=value strings into ExplainOption instances using a regex pattern.
Extraction
The extract.py module provides the defs function for extracting CREATE statements from a Materialize catalog.
defs Function
def defs(
target: Path,
database: str,
schema: str,
name: str,
db_port: int,
db_host: str,
db_user: str,
db_pass: str | None,
db_require_ssl: bool,
mzfmt: bool,
) -> None:
This function:
- Ensures the target directory exists.
- Connects to the Materialize database using the
sql.Databasewrapper. - Queries catalog items matching the specified
database,schema, andnameILIKE patterns. - For each item, writes a file containing the item's
id,oid, andcreate_sql.
Output files are organized using CreateFile dataclasses that encode the database, schema, name, and item type into the file path.
Analysis
The analyze.py module provides the changes function for comparing optimized plans across different extraction runs.
changes Function
def changes(
out: TextIO,
target: Path,
header_name: str,
base_suffix: str,
diff_suffix: str,
) -> None:
This function:
- Scans the target directory for files matching
**/*.optimized_plan.{base_suffix}.txt. - For each base plan file, locates the corresponding diff file with the
diff_suffix. - Compares the two plan texts and, if they differ, appends a section to the output with a
code --diffcommand for side-by-side comparison.
The output is formatted as a markdown checklist with TODO(REGRESSION|IMPROVEMENT) labels to guide manual review.
Cloning
The clone.py module provides the defs function for generating DDL scripts that replicate catalog objects into a different cluster.
defs Function
def defs(
ddl_out: TextIO,
cmp_out: TextIO,
database: str,
schema: str,
cluster: str,
object_ids: list[str],
db_port: int,
db_host: str,
db_user: str,
db_pass: str | None,
db_require_ssl: bool,
mzfmt: bool,
) -> None:
This function:
- Validates that at least one
object_idis provided. - Determines the source cluster from the provided object IDs (must be exactly one unique cluster).
- Verifies the source and destination clusters are different.
- Traverses clone dependencies using
db.clone_dependencies(). - Generates DDL that rewrites cluster references (e.g.,
IN CLUSTER old_clustertoIN CLUSTER new_cluster). - Categorizes references into aliased refs, index-on refs, and simple refs for correct dependency ordering.
SQL Database Wrapper
The sql.py module provides the Database class, a lightweight wrapper around pg8000 for connecting to Materialize.
Database Class
| Method | Description |
|---|---|
__init__ |
Establishes a connection with optional SSL, sets autocommit |
close |
Closes the database connection |
query_one |
Executes a query and returns a single row as a dictionary |
query_all |
Executes a query and yields all rows as dictionaries via a generator |
execute |
Executes a statement without returning results |
catalog_items |
Queries user or system catalog items with ILIKE pattern filtering |
object_clusters |
Retrieves cluster information for given object IDs |
class Database:
def __init__(self, port, host, user, password, database, require_ssl):
self.conn = pg8000.connect(
host=host, port=port, user=user,
password=password, database=database,
ssl_context=ssl_context,
)
self.conn.autocommit = True
The catalog_items method loads SQL from resource files (catalog/u_items.sql for user items, catalog/s_items.sql for system items) and applies ILIKE filters for database, schema, and name.
CLI Entry Point
The cli/mzexplore.py module provides a Click-based command-line interface that exposes mzexplore functionality.
Structure
The CLI uses a @click.group() with shared argument and option classes:
| Class | Parameters | Purpose |
|---|---|---|
Arg |
repository, output_file, base_suffix, diff_suffix |
Positional argument definitions |
Opt |
db_port, db_host, db_user, etc. |
Database connection options with env var support |
Default database connection settings:
| Option | Default | Env Var |
|---|---|---|
db_port |
6877 |
PGPORT
|
db_host |
localhost |
PGHOST
|
db_user |
mz_support |
(n/a) |
Key Source Files
| File | Path |
|---|---|
| Common types | misc/python/materialize/mzexplore/common.py
|
| Extraction | misc/python/materialize/mzexplore/extract.py
|
| Analysis | misc/python/materialize/mzexplore/analyze.py
|
| Cloning | misc/python/materialize/mzexplore/clone.py
|
| SQL wrapper | misc/python/materialize/mzexplore/sql.py
|
| CLI entry point | misc/python/materialize/cli/mzexplore.py
|