Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Iterative Dvc Repo Artifacts

From Leeroopedia


Knowledge Sources
Domains Artifact_Management, Model_Registry
Last Updated 2026-02-10 10:00 GMT

Overview

Concrete tool for managing artifacts such as models and datasets within DVC repositories, with GTO (Git Tag Operations) integration for versioning and lifecycle management.

Description

The dvc.repo.artifacts module defines the Artifacts class, which manages artifact operations including reading artifact metadata, adding new artifacts to the project, downloading artifacts from remote storage, and retrieving artifact details. Artifacts in DVC represent versioned assets (typically ML models or curated datasets) that are tracked in dvc.yaml and can be registered, versioned, and promoted through lifecycle stages using GTO. The class integrates with the DVC repository layer to resolve artifact paths, storage locations, and Git-based version metadata.

Usage

Use the Artifacts class through the Repo object to register new artifacts, query artifact metadata, download specific artifact versions, and manage the artifact lifecycle. This is particularly useful in model registry workflows where models need to be versioned, staged (e.g., dev, staging, production), and retrieved by downstream consumers.

Code Reference

Source Location

Signature

class Artifacts:
    """Manages artifact operations within a DVC repository.

    Provides methods for reading, adding, downloading, and retrieving
    artifact metadata and files.
    """

    def __init__(self, repo: "Repo"):
        """Initializes the Artifacts manager with a reference to the DVC repository."""

    def read(self) -> dict[str, dict]:
        """Reads and returns all artifact definitions from the project's dvc.yaml files.

        Returns:
            dict: A nested dictionary mapping artifact names to their metadata,
                  including path, type, description, labels, and meta fields.
        """

    def add(
        self,
        name: str,
        path: str,
        type: Optional[str] = None,
        desc: Optional[str] = None,
        labels: Optional[list[str]] = None,
        meta: Optional[dict[str, Any]] = None,
    ) -> dict:
        """Adds a new artifact entry to dvc.yaml.

        Args:
            name: Unique name for the artifact.
            path: Path to the artifact file or directory.
            type: Artifact type (e.g., "model", "dataset").
            desc: Human-readable description of the artifact.
            labels: List of labels for categorization.
            meta: Additional metadata key-value pairs.

        Returns:
            dict: The artifact entry as written to dvc.yaml.
        """

    def download(
        self,
        name: str,
        version: Optional[str] = None,
        stage: Optional[str] = None,
        out: Optional[str] = None,
        force: bool = False,
        config: Optional[dict[str, Any]] = None,
        jobs: Optional[int] = None,
    ) -> tuple[int, str]:
        """Downloads an artifact from remote storage.

        Args:
            name: Name of the artifact to download (may include remote repo prefix).
            version: Specific GTO version tag to download.
            stage: Specific GTO lifecycle stage to download (e.g., "prod").
            out: Local output path for the downloaded artifact.
            force: Whether to overwrite existing local files.
            config: DVC configuration overrides.
            jobs: Number of parallel download jobs.

        Returns:
            tuple: (number of files downloaded, output path).
        """

    def get(
        self,
        name: str,
        version: Optional[str] = None,
        stage: Optional[str] = None,
        config: Optional[dict[str, Any]] = None,
    ) -> dict:
        """Retrieves metadata for a specific artifact version or stage.

        Args:
            name: Name of the artifact.
            version: Specific GTO version to query.
            stage: Specific GTO lifecycle stage to query.
            config: DVC configuration overrides.

        Returns:
            dict: Artifact metadata including path, version, and stage information.
        """

Import

from dvc.repo.artifacts import Artifacts

I/O Contract

Inputs (add)

Name Type Required Description
name str Yes Unique name for the artifact
path str Yes Path to the artifact file or directory within the project
type Optional[str] No Artifact type identifier (e.g., "model", "dataset")
desc Optional[str] No Human-readable description
labels Optional[list[str]] No List of labels for categorization and filtering
meta Optional[dict] No Arbitrary metadata key-value pairs

Inputs (download)

Name Type Required Description
name str Yes Artifact name, optionally prefixed with a remote repository path
version Optional[str] No GTO version tag to download (e.g., "v1.0.0")
stage Optional[str] No GTO lifecycle stage to download (e.g., "prod", "staging")
out Optional[str] No Local output path; defaults to current directory
force bool No Overwrite existing local files if True
config Optional[dict] No DVC configuration overrides
jobs Optional[int] No Number of parallel download threads

Inputs (get)

Name Type Required Description
name str Yes Artifact name to query
version Optional[str] No Specific GTO version to retrieve metadata for
stage Optional[str] No Specific GTO lifecycle stage to retrieve metadata for
config Optional[dict] No DVC configuration overrides

Outputs

Method Return Type Description
read dict[str, dict] All artifact definitions mapped by name, including path, type, description, labels, and meta
add dict The artifact entry as written to dvc.yaml
download tuple[int, str] Tuple of (number of files downloaded, output path)
get dict Artifact metadata including path, version, stage, and remote information

Usage Examples

Register a New Model Artifact

from dvc.repo import Repo

repo = Repo()
repo.artifacts.add(
    name="text-classifier",
    path="models/classifier.pkl",
    type="model",
    desc="NLP text classification model trained on customer reviews",
    labels=["nlp", "classification", "production"],
    meta={"framework": "scikit-learn", "dataset_version": "v2.3"},
)

List All Artifacts

from dvc.repo import Repo

repo = Repo()
artifacts = repo.artifacts.read()

for name, info in artifacts.items():
    print(f"Artifact: {name}")
    print(f"  Path: {info.get('path')}")
    print(f"  Type: {info.get('type')}")
    print(f"  Description: {info.get('desc')}")

Download a Specific Model Version

from dvc.repo import Repo

repo = Repo()

# Download the production version of a model
count, path = repo.artifacts.download(
    name="text-classifier",
    stage="prod",
    out="/tmp/models/",
    jobs=4,
)
print(f"Downloaded {count} file(s) to {path}")

Get Artifact Metadata

from dvc.repo import Repo

repo = Repo()

# Retrieve metadata for a specific version
info = repo.artifacts.get(name="text-classifier", version="v1.2.0")
print(f"Artifact path: {info['path']}")
print(f"Version: {info.get('version')}")

Download from a Remote Repository

# Download an artifact from a remote DVC repository using CLI
dvc artifacts get https://github.com/example/project text-classifier --rev v1.2.0 -o ./local_model/

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment