Principle:MaterializeInc Materialize Catalog Exploration
Overview
The Catalog Exploration principle describes the approach used by the mzexplore module to extract, compare, and clone DDL from Materialize catalogs. This principle enables systematic analysis of optimizer behavior across versions, environments, or configuration changes by treating the catalog as a structured, queryable artifact.
Principle
A Materialize catalog can be fully explored by extracting its DDL and optimizer plans into a local file hierarchy, enabling offline comparison, regression detection, and cross-cluster replication without requiring direct access to the running system.
Motivation
Understanding how the Materialize optimizer transforms queries is critical for performance analysis and regression detection. However, comparing optimizer outputs requires:
- Consistent extraction of
CREATEstatements andEXPLAINplans from live systems. - A structured file layout that enables automated diffing across runs.
- The ability to replicate catalog objects in isolated clusters for A/B testing.
The mzexplore module addresses these needs with a pipeline of extract, analyze, and clone operations.
Extraction
Catalog Querying
The extraction process connects to Materialize via pg8000 and queries the system catalog for items matching ILIKE patterns on database, schema, and name. SQL queries are loaded from resource files:
| Resource File | Purpose |
|---|---|
catalog/u_items.sql |
Queries user-created catalog items |
catalog/s_items.sql |
Queries system catalog items |
DDL Extraction
For each matching catalog item, the extract.defs function writes a file containing:
-- id: <item_id>
-- oid: <item_oid>
<create_sql>
Files are organized using a naming convention derived from the CreateFile dataclass, encoding the database, schema, name, and item type into the path.
EXPLAIN Plan Extraction
The module supports extracting optimizer plans at multiple stages:
| Stage | SQL Clause |
|---|---|
| Raw plan | EXPLAIN RAW PLAN
|
| Decorrelated plan | EXPLAIN DECORRELATED PLAN
|
| Locally optimized plan | EXPLAIN LOCALLY OPTIMIZED PLAN
|
| Optimized plan | EXPLAIN OPTIMIZED PLAN
|
| Physical plan | EXPLAIN PHYSICAL PLAN
|
| Optimizer trace | EXPLAIN OPTIMIZER TRACE
|
Plans can be output in TEXT or JSON format, and ExplainOption feature flags (e.g., enable_*, reoptimize_imported_views) can be passed to alter optimizer behavior during extraction.
Comparison
Plan Diffing
The analyze.changes function compares pairs of extracted optimized plans:
- Scans for files matching
**/*.optimized_plan.{base_suffix}.txt. - Locates the corresponding diff file with the alternate suffix.
- When the plan texts differ, emits a diff entry with a
code --diffcommand.
The output format labels each difference as requiring manual classification:
- TODO(REGRESSION|IMPROVEMENT) in MATERIALIZED VIEW `db.schema.name`
```bash
code --diff \
/path/to/base.optimized_plan.v1.txt \
/path/to/diff.optimized_plan.v2.txt
```
This approach enables systematic review of optimizer changes across version upgrades or configuration adjustments.
Suffixed File Organization
Extraction runs are differentiated by suffix (e.g., v1, v2, before, after), allowing multiple snapshots to coexist in the same directory structure. The ExplainFile and explain_diff helpers manage the mapping between base and diff file paths.
Cloning
Cross-Cluster Replication
The clone.defs function generates DDL scripts that replicate catalog objects from one cluster to another:
- Dependency resolution -- Uses
db.clone_dependencies()to traverse the full dependency graph for the specified object IDs. - Cluster validation -- Ensures all source objects belong to exactly one cluster and that the target cluster is different.
- DDL rewriting -- Replaces
IN CLUSTER source_clusterwithIN CLUSTER target_clusterin all generated statements. - Dependency ordering -- Categorizes references into aliased refs, index-on refs, and simple refs to ensure correct creation order.
This enables A/B testing of optimizer changes by running identical workloads on separate clusters with different configurations.
Database Abstraction
The sql.Database class provides a consistent interface for catalog operations:
| Method | Purpose |
|---|---|
catalog_items |
List catalog items with ILIKE pattern filtering |
object_clusters |
Resolve cluster assignments for object IDs |
clone_dependencies |
Traverse dependency graphs for cloning |
query_one / query_all |
Execute arbitrary queries returning dictionaries |
All connections use autocommit=True and optionally support SSL for secure connections.
Trade-offs
| Benefit | Consideration |
|---|---|
| Offline analysis without live system access | Extracted data is a point-in-time snapshot; live changes require re-extraction |
| Automated regression detection via plan diffing | Diffs require manual classification as regression vs. improvement |
| Cross-cluster cloning for A/B testing | Cloning is limited to objects within a single source cluster |
| ILIKE pattern filtering for targeted extraction | Overly broad patterns can extract large volumes of data |
Related Pages
- Implementation:MaterializeInc_Materialize_Mzexplore_Module
- Namescore CLI -- A complementary tool measuring plan readability