Implementation:MaterializeInc Materialize Trim Tests Pipeline
| Knowledge Sources | Materialize CI pipeline generator, mzbuild dependency resolution |
|---|---|
| Domains | Continuous Integration, Dependency Analysis, Test Optimization, Python |
| Last Updated | 2026-02-08 |
Overview
Concrete Python function for incremental CI test trimming provided by Materialize's ci/mkpipeline.py, which analyzes pipeline step dependencies and git diffs to remove unchanged test steps from the Buildkite pipeline.
Description
The trim_tests_pipeline function is the core of Materialize's CI optimization strategy. It takes a parsed Buildkite pipeline (as a Python dictionary) and removes steps whose inputs have not changed relative to the main branch. The function is supported by two helper functions: steps() for iterating over pipeline steps (including those nested inside groups), and get_imported_files() for discovering Python module dependencies of test compositions.
The trimming algorithm proceeds in several phases:
- Dependency Resolution: Creates an
mzbuild.Repositoryand resolves all image dependencies. - Composition Discovery: Scans all pipeline steps for
mzcomposeplugin usage, collecting the set of composition paths. - Import Analysis: Uses
get_imported_files()to discover Python files imported by each composition, running these analyses in parallel via aThreadPoolExecutor. - Step Construction: Converts each pipeline step configuration into a
PipelineStepobject, tracking explicit inputs (globs), image dependencies, and step-to-step dependencies. - Change Detection: For each step, computes its full input set and uses
have_paths_changed()(which delegates togit diff) to check for changes. - Dependency Propagation: Starting from changed steps, recursively visits all step dependencies to build the needed set.
- Pipeline Restriction: Filters the pipeline to retain only needed steps, preserving groups that contain at least one needed step and all wait barriers.
Usage
This function is called from main() in ci/mkpipeline.py during pipeline generation for the test pipeline on non-main, non-tag branches where coverage and sanitizer modes are not active. It is also invoked as a dry run when CI glue code has changed (to exercise the trimming logic without actually trimming).
Code Reference
Source Location
ci/mkpipeline.py, lines 658-819 (trim_tests_pipeline)ci/mkpipeline.py, lines 65-69 (steps)ci/mkpipeline.py, lines 72-73 (get_imported_files)
Signature
def trim_tests_pipeline(
pipeline: Any,
coverage: bool,
sanitizer: Sanitizer,
lto: bool,
) -> None:
"""Trim pipeline steps whose inputs have not changed in this branch.
Steps are assigned inputs in two ways:
1. An explicit glob in the `inputs` key.
2. An implicit dependency on any number of mzbuild images via the
mzcompose plugin. Any steps which use the mzcompose plugin will
have inputs autodiscovered based on the images used in that
mzcompose configuration.
A step is trimmed if a) none of its inputs have changed, and b) there are
no other untrimmed steps that depend on it.
"""
def steps(pipeline: Any) -> Iterator[dict[str, Any]]:
for step in pipeline["steps"]:
yield step
if "group" in step:
yield from step.get("steps", [])
def get_imported_files(composition: str) -> list[str]:
return spawn.capture(["bin/ci-python-imports", composition]).splitlines()
Import
from ci.mkpipeline import trim_tests_pipeline, steps, get_imported_files
I/O Contract
Inputs
| Name | Type | Description |
|---|---|---|
pipeline |
Any (dict) |
Parsed YAML pipeline dictionary with a "steps" key. Modified in place to remove trimmed steps.
|
coverage |
bool |
Whether this is a coverage build. Affects mzbuild profile selection. |
sanitizer |
Sanitizer |
The sanitizer mode (e.g., Sanitizer.none, Sanitizer.address). Affects mzbuild profile.
|
lto |
bool |
Whether link-time optimization is enabled. Selects RELEASE vs OPTIMIZED mzbuild profile.
|
For steps():
| Name | Type | Description |
|---|---|---|
pipeline |
Any (dict) |
Parsed YAML pipeline dictionary with a "steps" key.
|
For get_imported_files():
| Name | Type | Description |
|---|---|---|
composition |
str |
Path to a composition directory (e.g., test/pg-cdc).
|
Outputs
| Name | Type | Description |
|---|---|---|
| (return) | None |
trim_tests_pipeline modifies the pipeline dictionary in place, removing steps that are not needed.
|
(return from steps) |
Iterator[dict[str, Any]] |
Yields each step dict, including steps nested inside groups. |
(return from get_imported_files) |
list[str] |
List of Python file paths transitively imported by the composition. |
| (side effect) | stdout | Prints each step with a checkmark or cross indicating whether it was kept or trimmed, along with its dependencies. |
Usage Examples
Trimming the test pipeline in a PR build:
import copy
import yaml
from pathlib import Path
from materialize.rustc_flags import Sanitizer
with open(Path("ci/test/pipeline.template.yml")) as f:
pipeline = yaml.safe_load(f.read())
# Trim steps whose inputs haven't changed vs main
trim_tests_pipeline(
pipeline,
coverage=False,
sanitizer=Sanitizer.none,
lto=False,
)
# pipeline is now modified in place with only needed steps
print(yaml.dump(pipeline))
Dry-run trimming when CI glue code has changed:
import copy
# Trim a deep copy so original pipeline is unmodified
trim_tests_pipeline(
copy.deepcopy(pipeline),
coverage=False,
sanitizer=Sanitizer.none,
lto=False,
)
Iterating over all steps including grouped steps:
for step in steps(pipeline):
if "id" in step:
print(f"Step: {step['id']}")
Discovering Python imports for a composition:
files = get_imported_files("test/pg-cdc")
print(f"Composition imports {len(files)} Python files")