Workflow:MaterializeInc Materialize Docker Image Build
| Knowledge Sources | |
|---|---|
| Domains | Build_System, Docker, Content_Addressable_Storage |
| Last Updated | 2026-02-08 21:00 GMT |
Overview
End-to-end process for building, fingerprinting, and publishing content-addressed Docker images using the mzbuild system.
Description
This workflow describes how the mzbuild system compiles Rust binaries and packages them into Docker images with content-based fingerprinting for efficient caching. Each image is defined by an mzbuild.yml manifest that declares its dependencies, Rust binaries, and build configuration. The system computes a SHA-1 fingerprint of all inputs (source files, Dockerfile, dependencies) and checks the container registry (GHCR) for a pre-built image matching that fingerprint before building. This enables incremental builds where unchanged images are reused across CI runs and developer machines.
Usage
Execute this workflow when building Materialize Docker images for testing, CI, or release. The mzbuild system is invoked automatically by mzcompose when starting services, by the CI pipeline for release artifact building, and by developers via the mzimage CLI tool. Understanding this workflow is essential for debugging image build failures, adding new Docker images, or optimizing build times.
Execution Steps
Step 1: Resolve Image Dependencies
The mzbuild Repository class scans all directories containing mzbuild.yml manifests to discover image definitions. It resolves the dependency graph between images (e.g., materialized depends on ubuntu-base) to determine the correct build order. Dependencies are declared in the manifest and form a directed acyclic graph.
Key considerations:
- Image dependencies create a DAG that must be built in topological order
- Each mzbuild.yml manifest declares the image name, Dockerfile, and dependencies
- The Repository class provides methods for querying and resolving images by name
- Cross-compilation support is available for multi-platform builds (x86_64, aarch64)
Step 2: Compute Content Fingerprints
For each image, the system computes a SHA-1 fingerprint based on all inputs: source files tracked by the build, Dockerfile contents, dependency fingerprints, Cargo manifest hashes, and build profile. The fingerprint uses base32 encoding and uniquely identifies a specific version of the image.
Key considerations:
- Fingerprints are deterministic: the same inputs always produce the same fingerprint
- Changes to any dependency propagate through the fingerprint chain
- Build profiles (Release, Optimized, Dev) affect the fingerprint
- The fingerprinting system handles Rust crate dependency resolution via Cargo metadata
Step 3: Check Registry for Cached Images
Before building, the system checks the GitHub Container Registry (GHCR) for an image matching the computed fingerprint. If a matching image exists, it is pulled instead of built, saving compilation time. The registry URL follows the pattern ghcr.io/materializeinc/{image_name}:{fingerprint}.
Key considerations:
- Registry checks use the Docker manifest inspection API for efficiency
- Cache hits are the common case in CI when only a few files change
- The pull-before-build strategy is critical for CI performance
- Network failures fall back to local building
Step 4: Build Docker Images
For images without a cache hit, the system executes the Rust compilation and Docker build process. Cargo builds the required binaries with the appropriate profile and feature flags. The binaries are then packaged into Docker images using the declared Dockerfile. The build system monitors for Rust internal compiler errors (ICE) and handles them specially.
Key considerations:
- Rust compilation uses sccache for shared compilation caching on S3
- Build profiles control optimization levels (Release for production, Dev for testing)
- Multi-platform builds use cross-compilation toolchains for aarch64 targets
- Internal compiler errors trigger special error handling and reporting
Step 5: Publish to Container Registry
After successful builds, images are tagged with their content fingerprint and pushed to GHCR. This makes them available for cache reuse by subsequent builds and CI runs. The publishing step is typically performed by CI after successful test runs.
Key considerations:
- Images are tagged with the content fingerprint, not the git SHA
- Published images are immutable: a fingerprint always maps to the same content
- The publish step is idempotent: re-publishing an existing tag is a no-op
- Multi-platform manifests are created for images that support multiple architectures