Principle:MaterializeInc Materialize Catalog Exploration

Overview

The Catalog Exploration principle describes the approach used by the mzexplore module to extract, compare, and clone DDL from Materialize catalogs. This principle enables systematic analysis of optimizer behavior across versions, environments, or configuration changes by treating the catalog as a structured, queryable artifact.

Principle

A Materialize catalog can be fully explored by extracting its DDL and optimizer plans into a local file hierarchy, enabling offline comparison, regression detection, and cross-cluster replication without requiring direct access to the running system.

Motivation

Understanding how the Materialize optimizer transforms queries is critical for performance analysis and regression detection. However, comparing optimizer outputs requires:

Consistent extraction of CREATE statements and EXPLAIN plans from live systems.
A structured file layout that enables automated diffing across runs.
The ability to replicate catalog objects in isolated clusters for A/B testing.

The mzexplore module addresses these needs with a pipeline of extract, analyze, and clone operations.

Extraction

Catalog Querying

The extraction process connects to Materialize via pg8000 and queries the system catalog for items matching ILIKE patterns on database, schema, and name. SQL queries are loaded from resource files:

Resource File	Purpose
`catalog/u_items.sql`	Queries user-created catalog items
`catalog/s_items.sql`	Queries system catalog items

DDL Extraction

For each matching catalog item, the extract.defs function writes a file containing:

-- id: <item_id>
-- oid: <item_oid>
<create_sql>

Files are organized using a naming convention derived from the CreateFile dataclass, encoding the database, schema, name, and item type into the path.

EXPLAIN Plan Extraction

The module supports extracting optimizer plans at multiple stages:

Stage	SQL Clause
Raw plan	`EXPLAIN RAW PLAN`
Decorrelated plan	`EXPLAIN DECORRELATED PLAN`
Locally optimized plan	`EXPLAIN LOCALLY OPTIMIZED PLAN`
Optimized plan	`EXPLAIN OPTIMIZED PLAN`
Physical plan	`EXPLAIN PHYSICAL PLAN`
Optimizer trace	`EXPLAIN OPTIMIZER TRACE`

Plans can be output in TEXT or JSON format, and ExplainOption feature flags (e.g., enable_*, reoptimize_imported_views) can be passed to alter optimizer behavior during extraction.

Comparison

Plan Diffing

The analyze.changes function compares pairs of extracted optimized plans:

Scans for files matching **/*.optimized_plan.{base_suffix}.txt.
Locates the corresponding diff file with the alternate suffix.
When the plan texts differ, emits a diff entry with a code --diff command.

The output format labels each difference as requiring manual classification:

- TODO(REGRESSION|IMPROVEMENT) in MATERIALIZED VIEW `db.schema.name`

  ```bash
  code --diff \
      /path/to/base.optimized_plan.v1.txt \
      /path/to/diff.optimized_plan.v2.txt
  ```

This approach enables systematic review of optimizer changes across version upgrades or configuration adjustments.

Suffixed File Organization

Extraction runs are differentiated by suffix (e.g., v1, v2, before, after), allowing multiple snapshots to coexist in the same directory structure. The ExplainFile and explain_diff helpers manage the mapping between base and diff file paths.

Cloning

Cross-Cluster Replication

The clone.defs function generates DDL scripts that replicate catalog objects from one cluster to another:

Dependency resolution -- Uses db.clone_dependencies() to traverse the full dependency graph for the specified object IDs.
Cluster validation -- Ensures all source objects belong to exactly one cluster and that the target cluster is different.
DDL rewriting -- Replaces IN CLUSTER source_cluster with IN CLUSTER target_cluster in all generated statements.
Dependency ordering -- Categorizes references into aliased refs, index-on refs, and simple refs to ensure correct creation order.

This enables A/B testing of optimizer changes by running identical workloads on separate clusters with different configurations.

Database Abstraction

The sql.Database class provides a consistent interface for catalog operations:

Method	Purpose
`catalog_items`	List catalog items with ILIKE pattern filtering
`object_clusters`	Resolve cluster assignments for object IDs
`clone_dependencies`	Traverse dependency graphs for cloning
`query_one` / `query_all`	Execute arbitrary queries returning dictionaries

All connections use autocommit=True and optionally support SSL for secure connections.

Trade-offs

Benefit	Consideration
Offline analysis without live system access	Extracted data is a point-in-time snapshot; live changes require re-extraction
Automated regression detection via plan diffing	Diffs require manual classification as regression vs. improvement
Cross-cluster cloning for A/B testing	Cloning is limited to objects within a single source cluster
ILIKE pattern filtering for targeted extraction	Overly broad patterns can extract large volumes of data

Related Pages

Implementation:MaterializeInc_Materialize_Mzexplore_Module
Namescore CLI -- A complementary tool measuring plan readability

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment