Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Principle:Iterative Dvc Artifact Resolution

From Leeroopedia


Knowledge Sources
Domains API, Artifact_Management
Last Updated 2026-02-10 00:00 GMT

Overview

Artifact resolution is the process of translating a logical artifact name into its concrete storage path, Git revision, and version metadata within a version control system, enabling consumers to retrieve specific artifact versions without knowledge of the underlying storage layout.

Description

In data version control workflows, artifacts such as trained models, processed datasets, and evaluation results are produced by pipelines and stored using content-addressable mechanisms. Over time, the same logical artifact (e.g., "sentiment-model") may exist across many Git revisions, branches, and tags. Consumers -- whether human users, deployment pipelines, or downstream services -- need a stable way to refer to an artifact by name and version, and have the system resolve that reference to the exact file path, Git commit, and storage location needed for retrieval.

Artifact resolution operates as a name-resolution layer that sits above the raw storage and version control systems. When a user requests an artifact by name (and optionally by version), the resolver searches the repository's artifact registry -- typically declared in dvc.yaml files -- to locate the matching entry. It then determines the Git revision associated with the requested version (which may be a Git tag following a naming convention like artifact_name@v1.0), extracts the file path of the artifact output, and returns a structured result containing the path, revision hash, and any associated metadata.

This indirection provides several important benefits. It decouples artifact consumers from the physical storage layout, meaning that files can be reorganized without breaking downstream references. It supports version pinning, allowing deployment systems to lock to a specific artifact version while development continues on newer versions. And it enables cross-repository artifact references, where an artifact produced in one repository can be consumed by another through a registry-based lookup rather than hardcoded paths.

Usage

Artifact resolution is invoked whenever:

  • A deployment pipeline requests a specific model version by name for serving.
  • A data scientist queries the artifact registry to find the latest version of a dataset.
  • An API client calls artifacts_show() to resolve an artifact name to its path and revision.
  • A downstream pipeline references an artifact from an upstream repository as an input dependency.
  • Version metadata is needed to construct download URLs or verify artifact integrity.

Theoretical Basis

Name resolution and indirection. Artifact resolution follows the classic systems design principle that any problem in computer science can be solved by adding a level of indirection. The resolver acts as a mapping function from a logical namespace to physical storage coordinates:

function resolve_artifact(name, version=None):
    # Step 1: Locate artifact declaration
    artifact_def = search_dvc_yaml(name)
    # artifact_def = {path: "models/sentiment.pkl", type: "model", ...}

    # Step 2: Resolve version to Git revision
    if version:
        tag = find_git_tag(pattern=f"{name}@v{version}")
        revision = tag.commit_hash
    else:
        revision = latest_tag_for(name).commit_hash

    # Step 3: Return resolved coordinates
    return ArtifactInfo(
        path = artifact_def.path,
        revision = revision,
        version = extract_version(tag),
        metadata = artifact_def.annotations
    )

This resolution pattern is analogous to DNS resolution in networking: a human-friendly name is translated through a lookup mechanism into machine-usable coordinates. Just as DNS allows servers to change IP addresses without breaking client bookmarks, artifact resolution allows files to be reorganized or re-hashed without breaking consumer references.

Version tagging conventions. The system establishes a convention-based mapping between artifact versions and Git tags. By encoding the artifact name and version number into the tag string (e.g., my-model@v3), the resolver can enumerate available versions by listing tags matching a pattern and can resolve a specific version by finding the corresponding tag's commit. This convention-over-configuration approach avoids the need for a separate version database while leveraging Git's existing tag infrastructure for version storage and ordering.

Related Pages

Implemented By

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment