Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:MaterializeInc Materialize ResolvedImage Fingerprint

From Leeroopedia


Knowledge Sources misc/python/materialize/mzbuild.py (Fingerprint class, ResolvedImage.fingerprint)
Domains Build Systems, Caching, Cryptographic Hashing, Container Infrastructure
Last Updated 2026-02-08

Overview

Concrete implementation of content-addressed image fingerprinting provided by the Fingerprint class and ResolvedImage.fingerprint cached property in Materialize's mzbuild system.

Description

The fingerprinting implementation consists of two components:

1. Fingerprint class (line 135-144): A subclass of bytes that represents a SHA-1 hash of build inputs. Its __str__ method returns a base32-encoded representation to visually distinguish mzbuild fingerprints from Git's hex-encoded SHA-1 hashes while remaining URL-safe for Docker image tags.

2. ResolvedImage.fingerprint (lines 1084-1136): A cached property that computes the content-addressed fingerprint by:

  1. Hashing all input files -- Iterates over all non-gitignored files in the image's mzbuild context (expanded via git.expand_globs), hashing each file's normalized mode and content.
  2. Including pre-image extras -- Incorporates additional hash material from pre-image actions (e.g., Cargo build configuration strings).
  3. Including build configuration -- Hashes the build profile, target architecture, coverage flag, and sanitizer setting.
  4. Including a mirror marker -- Adds a mirror=ghcr sentinel to invalidate all pre-GHCR-era hashes.
  5. Folding in dependency fingerprints -- Recursively incorporates the fingerprints of all resolved dependencies, creating a Merkle tree structure.

The result is a Fingerprint object that changes if any input to the image or its transitive dependencies changes.

Usage

Use ResolvedImage.fingerprint when:

  • Generating the image spec -- The fingerprint is embedded in the Docker image tag via ResolvedImage.spec(), producing tags like mzbuild-ABCDE....
  • Checking remote caches -- The fingerprint-based tag is queried against Docker Hub or GHCR to determine if a pre-built image already exists.
  • Determining rebuild necessity -- If the fingerprint matches an existing image, the build is skipped.

Code Reference

Source Location

File misc/python/materialize/mzbuild.py
Fingerprint class Lines 135-144
ResolvedImage.fingerprint Lines 1084-1136

Signature

class Fingerprint(bytes):
    """A SHA-1 hash of the inputs to an `Image`.

    The string representation uses base32 encoding to distinguish mzbuild
    fingerprints from Git's hex encoded SHA-1 hashes while still being
    URL safe.
    """

    def __str__(self) -> str:
        return base64.b32encode(self).decode()


class ResolvedImage:
    @cache
    def fingerprint(self) -> Fingerprint:
        """Fingerprint the inputs to the image.

        Compute the fingerprint of the image. Changing the contents of any of
        the files or adding or removing files to the image will change the
        fingerprint, as will modifying the inputs to any of its dependencies.

        The image considers all non-gitignored files in its mzbuild context to
        be inputs. If it has a pre-image action, that action may add additional
        inputs via `PreImage.inputs`.
        """
        ...

Import

from materialize.mzbuild import Fingerprint, ResolvedImage

I/O Contract

Inputs

Input Source Type Description
self.inputs() set[str] All non-gitignored file paths in the image's mzbuild context, expanded via git.expand_globs.
File contents bytes Raw byte content of each input file, read via open(abs_path, "rb").
File modes int Normalized POSIX file mode (symlink: 0o120000, executable: 0o100755, other: 0o100644).
pre_image.extra() str Additional hash material from each pre-image action (e.g., Cargo build flags).
self.image.rd.profile Profile The Rust build profile (RELEASE, OPTIMIZED, DEV).
self.image.rd.arch Arch Target CPU architecture.
self.image.rd.coverage bool Whether coverage instrumentation is enabled.
self.image.rd.sanitizer Sanitizer Active sanitizer mode.
dep.fingerprint() Fingerprint Recursive fingerprints of all resolved dependencies.

Outputs

Output Type Description
Return value Fingerprint A 20-byte SHA-1 hash (subclass of bytes) whose __str__ returns base32 encoding.
Caching @cache The result is memoized via functools.cache, so subsequent calls return the precomputed value.

Usage Examples

Computing and displaying a fingerprint:

from materialize.mzbuild import Repository
from pathlib import Path

repo = Repository(root=Path("/path/to/materialize"))
dep_set = repo.resolve_dependencies([repo.images["environmentd"]])

for resolved_image in dep_set:
    fp = resolved_image.fingerprint()
    print(f"{resolved_image.name}: {fp}")
    # Output: environmentd: ABCDEFGHIJKLMNOPQRSTUVWXYZ234567
    # (base32-encoded SHA-1)

How the fingerprint feeds into the image spec:

# ResolvedImage.spec() uses the fingerprint to create the Docker tag
resolved_image.spec()
# Returns: "materialize/environmentd:mzbuild-ABCDEFGHIJKLMNOP..."
# The tag format is: "mzbuild-{base32_fingerprint}"

Understanding the two-phase hash structure:

# Phase 1: self_hash -- hash of local inputs only
#   - All file paths, modes, and contents in the build context
#   - Pre-image extra() strings
#   - Build configuration (profile, arch, coverage, sanitizer)
#   - Mirror marker ("mirror=ghcr")

# Phase 2: full_hash -- self_hash folded with dependency fingerprints
#   - self_hash.digest()
#   - For each dependency (sorted by name):
#       dep.name + dep.fingerprint() + null byte

# This two-phase approach means:
#   - Changing a local file changes self_hash -> changes full_hash
#   - Changing a dependency changes dep.fingerprint() -> changes full_hash
#   - Both propagate to produce a new image tag

Related Pages

Implements Principle

Requires Environment

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment