Workflow:MaterializeInc Materialize Docker Image Build

Knowledge Sources	Materialize
Domains	Build_System, Docker, Content_Addressable_Storage
Last Updated	2026-02-08 21:00 GMT

Overview

End-to-end process for building, fingerprinting, and publishing content-addressed Docker images using the mzbuild system.

Description

This workflow describes how the mzbuild system compiles Rust binaries and packages them into Docker images with content-based fingerprinting for efficient caching. Each image is defined by an mzbuild.yml manifest that declares its dependencies, Rust binaries, and build configuration. The system computes a SHA-1 fingerprint of all inputs (source files, Dockerfile, dependencies) and checks the container registry (GHCR) for a pre-built image matching that fingerprint before building. This enables incremental builds where unchanged images are reused across CI runs and developer machines.

Usage

Execute this workflow when building Materialize Docker images for testing, CI, or release. The mzbuild system is invoked automatically by mzcompose when starting services, by the CI pipeline for release artifact building, and by developers via the mzimage CLI tool. Understanding this workflow is essential for debugging image build failures, adding new Docker images, or optimizing build times.

Execution Steps

Step 1: Resolve Image Dependencies

The mzbuild Repository class scans all directories containing mzbuild.yml manifests to discover image definitions. It resolves the dependency graph between images (e.g., materialized depends on ubuntu-base) to determine the correct build order. Dependencies are declared in the manifest and form a directed acyclic graph.

Key considerations:

Image dependencies create a DAG that must be built in topological order
Each mzbuild.yml manifest declares the image name, Dockerfile, and dependencies
The Repository class provides methods for querying and resolving images by name
Cross-compilation support is available for multi-platform builds (x86_64, aarch64)

Step 2: Compute Content Fingerprints

For each image, the system computes a SHA-1 fingerprint based on all inputs: source files tracked by the build, Dockerfile contents, dependency fingerprints, Cargo manifest hashes, and build profile. The fingerprint uses base32 encoding and uniquely identifies a specific version of the image.

Key considerations:

Fingerprints are deterministic: the same inputs always produce the same fingerprint
Changes to any dependency propagate through the fingerprint chain
Build profiles (Release, Optimized, Dev) affect the fingerprint
The fingerprinting system handles Rust crate dependency resolution via Cargo metadata

Step 3: Check Registry for Cached Images

Before building, the system checks the GitHub Container Registry (GHCR) for an image matching the computed fingerprint. If a matching image exists, it is pulled instead of built, saving compilation time. The registry URL follows the pattern ghcr.io/materializeinc/{image_name}:{fingerprint}.

Key considerations:

Registry checks use the Docker manifest inspection API for efficiency
Cache hits are the common case in CI when only a few files change
The pull-before-build strategy is critical for CI performance
Network failures fall back to local building

Step 4: Build Docker Images

For images without a cache hit, the system executes the Rust compilation and Docker build process. Cargo builds the required binaries with the appropriate profile and feature flags. The binaries are then packaged into Docker images using the declared Dockerfile. The build system monitors for Rust internal compiler errors (ICE) and handles them specially.

Key considerations:

Rust compilation uses sccache for shared compilation caching on S3
Build profiles control optimization levels (Release for production, Dev for testing)
Multi-platform builds use cross-compilation toolchains for aarch64 targets
Internal compiler errors trigger special error handling and reporting

Step 5: Publish to Container Registry

After successful builds, images are tagged with their content fingerprint and pushed to GHCR. This makes them available for cache reuse by subsequent builds and CI runs. The publishing step is typically performed by CI after successful test runs.

Key considerations:

Images are tagged with the content fingerprint, not the git SHA
Published images are immutable: a fingerprint always maps to the same content
The publish step is idempotent: re-publishing an existing tag is a no-op
Multi-platform manifests are created for images that support multiple architectures

Execution Diagram

GitHub URL

Workflow Repository