Implementation:MaterializeInc Materialize ResolvedImage Fingerprint
| Knowledge Sources | misc/python/materialize/mzbuild.py (Fingerprint class, ResolvedImage.fingerprint)
|
|---|---|
| Domains | Build Systems, Caching, Cryptographic Hashing, Container Infrastructure |
| Last Updated | 2026-02-08 |
Overview
Concrete implementation of content-addressed image fingerprinting provided by the Fingerprint class and ResolvedImage.fingerprint cached property in Materialize's mzbuild system.
Description
The fingerprinting implementation consists of two components:
1. Fingerprint class (line 135-144): A subclass of bytes that represents a SHA-1 hash of build inputs. Its __str__ method returns a base32-encoded representation to visually distinguish mzbuild fingerprints from Git's hex-encoded SHA-1 hashes while remaining URL-safe for Docker image tags.
2. ResolvedImage.fingerprint (lines 1084-1136): A cached property that computes the content-addressed fingerprint by:
- Hashing all input files -- Iterates over all non-gitignored files in the image's mzbuild context (expanded via
git.expand_globs), hashing each file's normalized mode and content. - Including pre-image extras -- Incorporates additional hash material from pre-image actions (e.g., Cargo build configuration strings).
- Including build configuration -- Hashes the build profile, target architecture, coverage flag, and sanitizer setting.
- Including a mirror marker -- Adds a
mirror=ghcrsentinel to invalidate all pre-GHCR-era hashes. - Folding in dependency fingerprints -- Recursively incorporates the fingerprints of all resolved dependencies, creating a Merkle tree structure.
The result is a Fingerprint object that changes if any input to the image or its transitive dependencies changes.
Usage
Use ResolvedImage.fingerprint when:
- Generating the image spec -- The fingerprint is embedded in the Docker image tag via
ResolvedImage.spec(), producing tags likemzbuild-ABCDE.... - Checking remote caches -- The fingerprint-based tag is queried against Docker Hub or GHCR to determine if a pre-built image already exists.
- Determining rebuild necessity -- If the fingerprint matches an existing image, the build is skipped.
Code Reference
Source Location
| File | misc/python/materialize/mzbuild.py
|
|---|---|
Fingerprint class
|
Lines 135-144 |
ResolvedImage.fingerprint
|
Lines 1084-1136 |
Signature
class Fingerprint(bytes):
"""A SHA-1 hash of the inputs to an `Image`.
The string representation uses base32 encoding to distinguish mzbuild
fingerprints from Git's hex encoded SHA-1 hashes while still being
URL safe.
"""
def __str__(self) -> str:
return base64.b32encode(self).decode()
class ResolvedImage:
@cache
def fingerprint(self) -> Fingerprint:
"""Fingerprint the inputs to the image.
Compute the fingerprint of the image. Changing the contents of any of
the files or adding or removing files to the image will change the
fingerprint, as will modifying the inputs to any of its dependencies.
The image considers all non-gitignored files in its mzbuild context to
be inputs. If it has a pre-image action, that action may add additional
inputs via `PreImage.inputs`.
"""
...
Import
from materialize.mzbuild import Fingerprint, ResolvedImage
I/O Contract
Inputs
| Input Source | Type | Description |
|---|---|---|
self.inputs() |
set[str] |
All non-gitignored file paths in the image's mzbuild context, expanded via git.expand_globs.
|
| File contents | bytes |
Raw byte content of each input file, read via open(abs_path, "rb").
|
| File modes | int |
Normalized POSIX file mode (symlink: 0o120000, executable: 0o100755, other: 0o100644).
|
pre_image.extra() |
str |
Additional hash material from each pre-image action (e.g., Cargo build flags). |
self.image.rd.profile |
Profile |
The Rust build profile (RELEASE, OPTIMIZED, DEV). |
self.image.rd.arch |
Arch |
Target CPU architecture. |
self.image.rd.coverage |
bool |
Whether coverage instrumentation is enabled. |
self.image.rd.sanitizer |
Sanitizer |
Active sanitizer mode. |
dep.fingerprint() |
Fingerprint |
Recursive fingerprints of all resolved dependencies. |
Outputs
| Output | Type | Description |
|---|---|---|
| Return value | Fingerprint |
A 20-byte SHA-1 hash (subclass of bytes) whose __str__ returns base32 encoding.
|
| Caching | @cache |
The result is memoized via functools.cache, so subsequent calls return the precomputed value.
|
Usage Examples
Computing and displaying a fingerprint:
from materialize.mzbuild import Repository
from pathlib import Path
repo = Repository(root=Path("/path/to/materialize"))
dep_set = repo.resolve_dependencies([repo.images["environmentd"]])
for resolved_image in dep_set:
fp = resolved_image.fingerprint()
print(f"{resolved_image.name}: {fp}")
# Output: environmentd: ABCDEFGHIJKLMNOPQRSTUVWXYZ234567
# (base32-encoded SHA-1)
How the fingerprint feeds into the image spec:
# ResolvedImage.spec() uses the fingerprint to create the Docker tag
resolved_image.spec()
# Returns: "materialize/environmentd:mzbuild-ABCDEFGHIJKLMNOP..."
# The tag format is: "mzbuild-{base32_fingerprint}"
Understanding the two-phase hash structure:
# Phase 1: self_hash -- hash of local inputs only
# - All file paths, modes, and contents in the build context
# - Pre-image extra() strings
# - Build configuration (profile, arch, coverage, sanitizer)
# - Mirror marker ("mirror=ghcr")
# Phase 2: full_hash -- self_hash folded with dependency fingerprints
# - self_hash.digest()
# - For each dependency (sorted by name):
# dep.name + dep.fingerprint() + null byte
# This two-phase approach means:
# - Changing a local file changes self_hash -> changes full_hash
# - Changing a dependency changes dep.fingerprint() -> changes full_hash
# - Both propagate to produce a new image tag