Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Principle:MaterializeInc Materialize CI Pipeline Registration

From Leeroopedia


Knowledge Sources ci/mkpipeline.py, ci/test/pipeline.template.yml
Domains Continuous Integration, Pipeline Configuration, Test Scheduling
Last Updated 2026-02-08

Overview

Test-to-CI integration registers integration tests into CI pipelines so they are automatically triggered by code changes, with input-based trimming to skip tests whose dependencies have not changed.

Description

The CI Pipeline Registration principle defines how integration tests are declared as steps in Materialize's CI pipelines and how the pipeline system determines which tests to run for a given code change. This principle bridges the gap between test authoring (writing mzcompose.py workflows) and test execution in CI (running those workflows on Buildkite).

The pipeline system has two key components:

Pipeline template files (ci/test/pipeline.template.yml): These YAML files define the full set of CI steps, organized into groups. Each step that runs an integration test uses the ./ci/plugins/mzcompose plugin, which specifies the composition name (the directory name under test/ containing the mzcompose.py file). Steps can also declare explicit inputs (glob patterns for files that, when changed, should trigger the step), depends_on (other steps that must complete first), timeout_in_minutes, parallelism, and agent queue requirements.

Pipeline trimming (ci/mkpipeline.py, trim_tests_pipeline()): This function processes the template to determine which steps need to run for the current code change. For each step that uses the mzcompose plugin, it automatically discovers input dependencies by:

  • Resolving the mzbuild image dependency graph for the composition.
  • Adding the composition's mzcompose.py file itself as an input.
  • Adding all transitively imported Python modules as inputs.
  • Combining these with any explicitly declared inputs globs.

The trimming algorithm then uses git diff to determine which inputs have changed relative to the main branch. A step is included in the pipeline if any of its inputs have changed, or if it is a dependency of another step that needs to run. Steps whose inputs have not changed are skipped, significantly reducing CI time for targeted changes.

Usage

Use this principle when:

  • Adding a new integration test and registering it in the CI pipeline.
  • Understanding why a particular test did or did not run in a CI build.
  • Adding explicit input globs to a step to ensure it runs when specific non-code files change.
  • Debugging pipeline trimming behavior by examining the trimming output.

Theoretical Basis

The principle draws on several concepts from build systems and CI/CD:

  1. Input-based invalidation: Like build systems (Make, Bazel, Buck), the pipeline trimming system determines whether a step needs to run based on whether its inputs have changed. The "inputs" are a combination of file globs and image dependency graphs. This is analogous to the cache invalidation strategy used by content-addressable build systems.
  1. Dependency graph resolution: The trim_tests_pipeline() function builds a directed graph of pipeline steps (via depends_on) and image dependencies (via mzbuild). When a step is marked as "needed," all steps it depends on are also marked as needed through a depth-first traversal. This ensures that build steps are not skipped when their downstream test steps need to run.
  1. Automatic dependency discovery: Rather than requiring test authors to manually enumerate every source file that a test depends on, the system automatically discovers dependencies through the mzbuild image dependency graph and Python module import analysis. This reduces the maintenance burden and the risk of under-specifying inputs (which would cause tests to be incorrectly skipped).
  1. Declarative step specification: Steps are declared in YAML templates with a fixed schema (id, label, depends_on, inputs, plugins, agents, timeout_in_minutes, parallelism). This declarative approach makes the pipeline structure visible and auditable, and enables tooling to process it programmatically.
  1. Change-based triggering: The pipeline uses git diff against the main branch to determine what has changed. This approach is simple and reliable: it does not require maintaining a separate dependency database or cache, and it correctly handles all forms of code changes including new files, deleted files, and renamed files.

Related Pages

Implemented By

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment