Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Principle:MaterializeInc Materialize Catalog Exploration

From Leeroopedia


Overview

The Catalog Exploration principle describes the approach used by the mzexplore module to extract, compare, and clone DDL from Materialize catalogs. This principle enables systematic analysis of optimizer behavior across versions, environments, or configuration changes by treating the catalog as a structured, queryable artifact.

Principle

A Materialize catalog can be fully explored by extracting its DDL and optimizer plans into a local file hierarchy, enabling offline comparison, regression detection, and cross-cluster replication without requiring direct access to the running system.

Motivation

Understanding how the Materialize optimizer transforms queries is critical for performance analysis and regression detection. However, comparing optimizer outputs requires:

  • Consistent extraction of CREATE statements and EXPLAIN plans from live systems.
  • A structured file layout that enables automated diffing across runs.
  • The ability to replicate catalog objects in isolated clusters for A/B testing.

The mzexplore module addresses these needs with a pipeline of extract, analyze, and clone operations.

Extraction

Catalog Querying

The extraction process connects to Materialize via pg8000 and queries the system catalog for items matching ILIKE patterns on database, schema, and name. SQL queries are loaded from resource files:

Resource File Purpose
catalog/u_items.sql Queries user-created catalog items
catalog/s_items.sql Queries system catalog items

DDL Extraction

For each matching catalog item, the extract.defs function writes a file containing:

-- id: <item_id>
-- oid: <item_oid>
<create_sql>

Files are organized using a naming convention derived from the CreateFile dataclass, encoding the database, schema, name, and item type into the path.

EXPLAIN Plan Extraction

The module supports extracting optimizer plans at multiple stages:

Stage SQL Clause
Raw plan EXPLAIN RAW PLAN
Decorrelated plan EXPLAIN DECORRELATED PLAN
Locally optimized plan EXPLAIN LOCALLY OPTIMIZED PLAN
Optimized plan EXPLAIN OPTIMIZED PLAN
Physical plan EXPLAIN PHYSICAL PLAN
Optimizer trace EXPLAIN OPTIMIZER TRACE

Plans can be output in TEXT or JSON format, and ExplainOption feature flags (e.g., enable_*, reoptimize_imported_views) can be passed to alter optimizer behavior during extraction.

Comparison

Plan Diffing

The analyze.changes function compares pairs of extracted optimized plans:

  1. Scans for files matching **/*.optimized_plan.{base_suffix}.txt.
  2. Locates the corresponding diff file with the alternate suffix.
  3. When the plan texts differ, emits a diff entry with a code --diff command.

The output format labels each difference as requiring manual classification:

- TODO(REGRESSION|IMPROVEMENT) in MATERIALIZED VIEW `db.schema.name`

  ```bash
  code --diff \
      /path/to/base.optimized_plan.v1.txt \
      /path/to/diff.optimized_plan.v2.txt
  ```

This approach enables systematic review of optimizer changes across version upgrades or configuration adjustments.

Suffixed File Organization

Extraction runs are differentiated by suffix (e.g., v1, v2, before, after), allowing multiple snapshots to coexist in the same directory structure. The ExplainFile and explain_diff helpers manage the mapping between base and diff file paths.

Cloning

Cross-Cluster Replication

The clone.defs function generates DDL scripts that replicate catalog objects from one cluster to another:

  1. Dependency resolution -- Uses db.clone_dependencies() to traverse the full dependency graph for the specified object IDs.
  2. Cluster validation -- Ensures all source objects belong to exactly one cluster and that the target cluster is different.
  3. DDL rewriting -- Replaces IN CLUSTER source_cluster with IN CLUSTER target_cluster in all generated statements.
  4. Dependency ordering -- Categorizes references into aliased refs, index-on refs, and simple refs to ensure correct creation order.

This enables A/B testing of optimizer changes by running identical workloads on separate clusters with different configurations.

Database Abstraction

The sql.Database class provides a consistent interface for catalog operations:

Method Purpose
catalog_items List catalog items with ILIKE pattern filtering
object_clusters Resolve cluster assignments for object IDs
clone_dependencies Traverse dependency graphs for cloning
query_one / query_all Execute arbitrary queries returning dictionaries

All connections use autocommit=True and optionally support SSL for secure connections.

Trade-offs

Benefit Consideration
Offline analysis without live system access Extracted data is a point-in-time snapshot; live changes require re-extraction
Automated regression detection via plan diffing Diffs require manual classification as regression vs. improvement
Cross-cluster cloning for A/B testing Cloning is limited to objects within a single source cluster
ILIKE pattern filtering for targeted extraction Overly broad patterns can extract large volumes of data

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment