Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Heuristic:MaterializeInc Materialize Docker Image Cache Lookup

From Leeroopedia




Knowledge Sources
Domains Infrastructure, Optimization
Last Updated 2026-02-08 21:00 GMT

Overview

Multi-tier Docker image existence check strategy that avoids Docker Hub rate limits by preferring API lookups over CLI commands, with local caching via a `known-docker-images.txt` file.

Description

The `is_docker_image_pushed()` and related functions in `mzbuild.py` implement a three-tier strategy for checking whether a Docker image exists in a remote registry: (1) check a local file cache (`known-docker-images.txt`), (2) use the Docker Hub REST API or GHCR token-based API, (3) fall back to `docker manifest inspect` CLI command. The file cache persists across builds within the same workspace, preventing redundant network calls. The API-first approach avoids the Docker Hub rate limit penalty that `docker manifest inspect` incurs even for non-existent images.

Usage

Apply this heuristic when debugging image lookup failures, understanding build cache behavior, or optimizing Docker image build times. It is central to the Mz_Image_Tag_Exists, ResolvedImage_Build, and ResolvedImage_Fingerprint implementations.

The Insight (Rule of Thumb)

  • Action: Always check local cache first, then use HTTP API, then fall back to Docker CLI.
  • Value: Avoids Docker Hub rate limits (which count `docker manifest inspect` against the pull quota) and reduces network round trips.
  • Trade-off: The file cache can become stale if an image is deleted from the registry. However, false positives (cache says exists, but deleted) are rare in practice because images are immutable once published.
  • Registry-specific: Docker Hub uses the REST API (`hub.docker.com/v2/repositories/`), while GHCR uses the OCI token endpoint (`ghcr.io/token` + HEAD request to `/v2/.../manifests/`).

Reasoning

Docker Hub rate limits are a significant operational concern for CI systems that check many images per pipeline run. The `docker manifest inspect` command counts against pull rate limits even when the image doesn't exist, making it unsuitable for frequent existence checks. The Docker Hub API (`/v2/repositories/.../tags/`) and GHCR API (`/v2/.../manifests/`) provide a lighter-weight alternative. The local file cache eliminates repeated checks for images that are known to exist from previous steps in the same build.

The fallback to CLI (`docker manifest inspect`) handles edge cases like API downtime (HTTP 401, 429, 500, 502, 503, 504) and non-standard registries.

Code Evidence

Multi-tier lookup from `docker.py:67-113`:

def mz_image_tag_exists(image_tag: str, quiet: bool = False) -> bool:
    image_name = f"{image_registry()}/materialized:{image_tag}"
    # Tier 1: In-memory cache
    if image_name in EXISTENCE_OF_IMAGE_NAMES_FROM_EARLIER_CHECK:
        return EXISTENCE_OF_IMAGE_NAMES_FROM_EARLIER_CHECK[image_name]
    # Tier 2: Local Docker images
    output = subprocess.check_output(
        ["docker", "images", "--quiet", image_name], ...
    )
    if output:
        EXISTENCE_OF_IMAGE_NAMES_FROM_EARLIER_CHECK[image_name] = True
        return True
    # Tier 3: Docker Hub API (avoids rate limits)
    if image_registry() != "materialize":
        return mz_image_tag_exists_cmdline(image_name)
    response = requests.get(
        f"https://hub.docker.com/v2/repositories/materialize/materialized/tags/{image_tag}"
    )

File-based persistent cache from `mzbuild.py:246-264`:

KNOWN_DOCKER_IMAGES_FILE = Path(MZ_ROOT / "known-docker-images.txt")
_known_docker_images: set[str] | None = None

def is_docker_image_pushed(name: str) -> bool:
    # ... loads from file, checks in-memory set
    if name in _known_docker_images:
        return True
    # ... falls back to API/CLI, appends to file on success

Rate limit fallback from `mzbuild.py:304-312`:

if response.status_code in (401, 429, 500, 502, 503, 504):
    # Fall back to 5x slower method
    proc = subprocess.run(
        ["docker", "manifest", "inspect", name],
        stdout=subprocess.DEVNULL, stderr=subprocess.DEVNULL,
        env=dict(os.environ, DOCKER_CLI_EXPERIMENTAL="enabled"),
    )
    exists = proc.returncode == 0

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment