Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:MaterializeInc Materialize Trim Tests Pipeline

From Leeroopedia


Knowledge Sources Materialize CI pipeline generator, mzbuild dependency resolution
Domains Continuous Integration, Dependency Analysis, Test Optimization, Python
Last Updated 2026-02-08

Overview

Concrete Python function for incremental CI test trimming provided by Materialize's ci/mkpipeline.py, which analyzes pipeline step dependencies and git diffs to remove unchanged test steps from the Buildkite pipeline.

Description

The trim_tests_pipeline function is the core of Materialize's CI optimization strategy. It takes a parsed Buildkite pipeline (as a Python dictionary) and removes steps whose inputs have not changed relative to the main branch. The function is supported by two helper functions: steps() for iterating over pipeline steps (including those nested inside groups), and get_imported_files() for discovering Python module dependencies of test compositions.

The trimming algorithm proceeds in several phases:

  1. Dependency Resolution: Creates an mzbuild.Repository and resolves all image dependencies.
  2. Composition Discovery: Scans all pipeline steps for mzcompose plugin usage, collecting the set of composition paths.
  3. Import Analysis: Uses get_imported_files() to discover Python files imported by each composition, running these analyses in parallel via a ThreadPoolExecutor.
  4. Step Construction: Converts each pipeline step configuration into a PipelineStep object, tracking explicit inputs (globs), image dependencies, and step-to-step dependencies.
  5. Change Detection: For each step, computes its full input set and uses have_paths_changed() (which delegates to git diff) to check for changes.
  6. Dependency Propagation: Starting from changed steps, recursively visits all step dependencies to build the needed set.
  7. Pipeline Restriction: Filters the pipeline to retain only needed steps, preserving groups that contain at least one needed step and all wait barriers.

Usage

This function is called from main() in ci/mkpipeline.py during pipeline generation for the test pipeline on non-main, non-tag branches where coverage and sanitizer modes are not active. It is also invoked as a dry run when CI glue code has changed (to exercise the trimming logic without actually trimming).

Code Reference

Source Location

  • ci/mkpipeline.py, lines 658-819 (trim_tests_pipeline)
  • ci/mkpipeline.py, lines 65-69 (steps)
  • ci/mkpipeline.py, lines 72-73 (get_imported_files)

Signature

def trim_tests_pipeline(
    pipeline: Any,
    coverage: bool,
    sanitizer: Sanitizer,
    lto: bool,
) -> None:
    """Trim pipeline steps whose inputs have not changed in this branch.

    Steps are assigned inputs in two ways:

      1. An explicit glob in the `inputs` key.
      2. An implicit dependency on any number of mzbuild images via the
         mzcompose plugin. Any steps which use the mzcompose plugin will
         have inputs autodiscovered based on the images used in that
         mzcompose configuration.

    A step is trimmed if a) none of its inputs have changed, and b) there are
    no other untrimmed steps that depend on it.
    """
def steps(pipeline: Any) -> Iterator[dict[str, Any]]:
    for step in pipeline["steps"]:
        yield step
        if "group" in step:
            yield from step.get("steps", [])
def get_imported_files(composition: str) -> list[str]:
    return spawn.capture(["bin/ci-python-imports", composition]).splitlines()

Import

from ci.mkpipeline import trim_tests_pipeline, steps, get_imported_files

I/O Contract

Inputs

Name Type Description
pipeline Any (dict) Parsed YAML pipeline dictionary with a "steps" key. Modified in place to remove trimmed steps.
coverage bool Whether this is a coverage build. Affects mzbuild profile selection.
sanitizer Sanitizer The sanitizer mode (e.g., Sanitizer.none, Sanitizer.address). Affects mzbuild profile.
lto bool Whether link-time optimization is enabled. Selects RELEASE vs OPTIMIZED mzbuild profile.

For steps():

Name Type Description
pipeline Any (dict) Parsed YAML pipeline dictionary with a "steps" key.

For get_imported_files():

Name Type Description
composition str Path to a composition directory (e.g., test/pg-cdc).

Outputs

Name Type Description
(return) None trim_tests_pipeline modifies the pipeline dictionary in place, removing steps that are not needed.
(return from steps) Iterator[dict[str, Any]] Yields each step dict, including steps nested inside groups.
(return from get_imported_files) list[str] List of Python file paths transitively imported by the composition.
(side effect) stdout Prints each step with a checkmark or cross indicating whether it was kept or trimmed, along with its dependencies.

Usage Examples

Trimming the test pipeline in a PR build:

import copy
import yaml
from pathlib import Path
from materialize.rustc_flags import Sanitizer

with open(Path("ci/test/pipeline.template.yml")) as f:
    pipeline = yaml.safe_load(f.read())

# Trim steps whose inputs haven't changed vs main
trim_tests_pipeline(
    pipeline,
    coverage=False,
    sanitizer=Sanitizer.none,
    lto=False,
)

# pipeline is now modified in place with only needed steps
print(yaml.dump(pipeline))

Dry-run trimming when CI glue code has changed:

import copy

# Trim a deep copy so original pipeline is unmodified
trim_tests_pipeline(
    copy.deepcopy(pipeline),
    coverage=False,
    sanitizer=Sanitizer.none,
    lto=False,
)

Iterating over all steps including grouped steps:

for step in steps(pipeline):
    if "id" in step:
        print(f"Step: {step['id']}")

Discovering Python imports for a composition:

files = get_imported_files("test/pg-cdc")
print(f"Composition imports {len(files)} Python files")

Related Pages

Implements Principle

Requires Environment

Uses Heuristic

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment