Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Workflow:MaterializeInc Materialize CI Pipeline Generation

From Leeroopedia


Knowledge Sources
Domains CI_CD, Build_Automation, DevOps
Last Updated 2026-02-08 21:00 GMT

Overview

End-to-end process for generating, optimizing, and executing the Buildkite CI pipeline from code changes through test execution and result reporting.

Description

This workflow covers how the Materialize CI system generates a Buildkite pipeline from a YAML template, applies test trimming based on changed files, and orchestrates parallel test execution across self-managed AWS build agents. The pipeline generator (mkpipeline.py) reads a template pipeline, identifies which tests are affected by code changes relative to main, and emits an optimized pipeline that skips unaffected test compositions. Each pipeline step maps to an mzcompose composition, which defines Docker services and test workflows. Results are aggregated as JUnit XML and annotated in Buildkite.

Usage

Execute this workflow when a developer pushes code to a pull request branch or when a scheduled nightly/release pipeline triggers. The CI system automatically invokes mkpipeline.sh which bootstraps the Python environment and runs mkpipeline.py to generate the test pipeline. Understanding this workflow is essential for debugging CI failures, adding new test compositions to CI, or optimizing pipeline performance.

Execution Steps

Step 1: Pipeline Bootstrap

The Buildkite agent triggers mkpipeline.sh, which sets up the Python environment and invokes the pipeline generator. The shell script ensures all required dependencies are available and passes configuration flags (coverage mode, sanitizer mode, priority) to the Python generator.

Key considerations:

  • The bootstrap script runs early in the Buildkite lifecycle before any test steps
  • Environment variables from Buildkite (branch, commit SHA, PR metadata) are available
  • The script handles both PR-triggered and scheduled pipeline runs

Step 2: Change Detection and Test Trimming

The pipeline generator compares the current branch against main to determine which files have changed. It resolves the dependency graph of mzcompose compositions to determine which test steps are affected by the changed files. Unaffected steps are skipped to reduce CI time and resource usage.

Key considerations:

  • Files matching CI_GLUE_GLOBS (core infrastructure) always trigger full test runs
  • Each mzcompose composition declares its imported files for dependency tracking
  • The trimming logic is conservative: if dependency resolution fails, the step runs anyway

Step 3: Pipeline YAML Generation

The generator reads the pipeline template YAML and processes each step. Steps that survive trimming are emitted into the final pipeline. Grouped steps (parallel blocks) are handled by iterating their sub-steps. The output is a valid Buildkite pipeline YAML.

Key considerations:

  • Steps can be conditionally included based on coverage or sanitizer mode
  • Priority flags control step ordering in the Buildkite queue
  • The pipeline supports both individual steps and grouped parallel step blocks

Step 4: Composition Execution

Each pipeline step invokes bin/mzcompose --find COMPOSITION run WORKFLOW on a build agent. The mzcompose CLI discovers the composition file, builds required Docker images via mzbuild, starts services, and executes the test workflow function defined in the mzcompose.py file.

Key considerations:

  • Build agents are AWS EC2 c5.2xlarge instances with Docker isolation
  • sccache on shared S3 accelerates Rust compilation across agents
  • Each composition runs in its own Docker network for isolation

Step 5: Test Result Aggregation

Test results from each composition are collected as JUnit XML files. The ci_annotate_errors.py tool scans results for failures and creates Buildkite annotations with error details, linking to known GitHub issues when matched.

Key considerations:

  • Results are uploaded as Buildkite artifacts
  • Error annotations include stack traces and relevant log snippets
  • Known flaky tests can be matched against a database of GitHub issues

Execution Diagram

GitHub URL

Workflow Repository