Overview
Concrete tool for managing artifacts such as models and datasets within DVC repositories, with GTO (Git Tag Operations) integration for versioning and lifecycle management.
Description
The dvc.repo.artifacts module defines the Artifacts class, which manages artifact operations including reading artifact metadata, adding new artifacts to the project, downloading artifacts from remote storage, and retrieving artifact details. Artifacts in DVC represent versioned assets (typically ML models or curated datasets) that are tracked in dvc.yaml and can be registered, versioned, and promoted through lifecycle stages using GTO. The class integrates with the DVC repository layer to resolve artifact paths, storage locations, and Git-based version metadata.
Usage
Use the Artifacts class through the Repo object to register new artifacts, query artifact metadata, download specific artifact versions, and manage the artifact lifecycle. This is particularly useful in model registry workflows where models need to be versioned, staged (e.g., dev, staging, production), and retrieved by downstream consumers.
Code Reference
Source Location
Signature
class Artifacts:
"""Manages artifact operations within a DVC repository.
Provides methods for reading, adding, downloading, and retrieving
artifact metadata and files.
"""
def __init__(self, repo: "Repo"):
"""Initializes the Artifacts manager with a reference to the DVC repository."""
def read(self) -> dict[str, dict]:
"""Reads and returns all artifact definitions from the project's dvc.yaml files.
Returns:
dict: A nested dictionary mapping artifact names to their metadata,
including path, type, description, labels, and meta fields.
"""
def add(
self,
name: str,
path: str,
type: Optional[str] = None,
desc: Optional[str] = None,
labels: Optional[list[str]] = None,
meta: Optional[dict[str, Any]] = None,
) -> dict:
"""Adds a new artifact entry to dvc.yaml.
Args:
name: Unique name for the artifact.
path: Path to the artifact file or directory.
type: Artifact type (e.g., "model", "dataset").
desc: Human-readable description of the artifact.
labels: List of labels for categorization.
meta: Additional metadata key-value pairs.
Returns:
dict: The artifact entry as written to dvc.yaml.
"""
def download(
self,
name: str,
version: Optional[str] = None,
stage: Optional[str] = None,
out: Optional[str] = None,
force: bool = False,
config: Optional[dict[str, Any]] = None,
jobs: Optional[int] = None,
) -> tuple[int, str]:
"""Downloads an artifact from remote storage.
Args:
name: Name of the artifact to download (may include remote repo prefix).
version: Specific GTO version tag to download.
stage: Specific GTO lifecycle stage to download (e.g., "prod").
out: Local output path for the downloaded artifact.
force: Whether to overwrite existing local files.
config: DVC configuration overrides.
jobs: Number of parallel download jobs.
Returns:
tuple: (number of files downloaded, output path).
"""
def get(
self,
name: str,
version: Optional[str] = None,
stage: Optional[str] = None,
config: Optional[dict[str, Any]] = None,
) -> dict:
"""Retrieves metadata for a specific artifact version or stage.
Args:
name: Name of the artifact.
version: Specific GTO version to query.
stage: Specific GTO lifecycle stage to query.
config: DVC configuration overrides.
Returns:
dict: Artifact metadata including path, version, and stage information.
"""
Import
from dvc.repo.artifacts import Artifacts
I/O Contract
Inputs (add)
| Name |
Type |
Required |
Description
|
| name |
str |
Yes |
Unique name for the artifact
|
| path |
str |
Yes |
Path to the artifact file or directory within the project
|
| type |
Optional[str] |
No |
Artifact type identifier (e.g., "model", "dataset")
|
| desc |
Optional[str] |
No |
Human-readable description
|
| labels |
Optional[list[str]] |
No |
List of labels for categorization and filtering
|
| meta |
Optional[dict] |
No |
Arbitrary metadata key-value pairs
|
Inputs (download)
| Name |
Type |
Required |
Description
|
| name |
str |
Yes |
Artifact name, optionally prefixed with a remote repository path
|
| version |
Optional[str] |
No |
GTO version tag to download (e.g., "v1.0.0")
|
| stage |
Optional[str] |
No |
GTO lifecycle stage to download (e.g., "prod", "staging")
|
| out |
Optional[str] |
No |
Local output path; defaults to current directory
|
| force |
bool |
No |
Overwrite existing local files if True
|
| config |
Optional[dict] |
No |
DVC configuration overrides
|
| jobs |
Optional[int] |
No |
Number of parallel download threads
|
Inputs (get)
| Name |
Type |
Required |
Description
|
| name |
str |
Yes |
Artifact name to query
|
| version |
Optional[str] |
No |
Specific GTO version to retrieve metadata for
|
| stage |
Optional[str] |
No |
Specific GTO lifecycle stage to retrieve metadata for
|
| config |
Optional[dict] |
No |
DVC configuration overrides
|
Outputs
| Method |
Return Type |
Description
|
| read |
dict[str, dict] |
All artifact definitions mapped by name, including path, type, description, labels, and meta
|
| add |
dict |
The artifact entry as written to dvc.yaml
|
| download |
tuple[int, str] |
Tuple of (number of files downloaded, output path)
|
| get |
dict |
Artifact metadata including path, version, stage, and remote information
|
Usage Examples
Register a New Model Artifact
from dvc.repo import Repo
repo = Repo()
repo.artifacts.add(
name="text-classifier",
path="models/classifier.pkl",
type="model",
desc="NLP text classification model trained on customer reviews",
labels=["nlp", "classification", "production"],
meta={"framework": "scikit-learn", "dataset_version": "v2.3"},
)
List All Artifacts
from dvc.repo import Repo
repo = Repo()
artifacts = repo.artifacts.read()
for name, info in artifacts.items():
print(f"Artifact: {name}")
print(f" Path: {info.get('path')}")
print(f" Type: {info.get('type')}")
print(f" Description: {info.get('desc')}")
Download a Specific Model Version
from dvc.repo import Repo
repo = Repo()
# Download the production version of a model
count, path = repo.artifacts.download(
name="text-classifier",
stage="prod",
out="/tmp/models/",
jobs=4,
)
print(f"Downloaded {count} file(s) to {path}")
Get Artifact Metadata
from dvc.repo import Repo
repo = Repo()
# Retrieve metadata for a specific version
info = repo.artifacts.get(name="text-classifier", version="v1.2.0")
print(f"Artifact path: {info['path']}")
print(f"Version: {info.get('version')}")
Download from a Remote Repository
# Download an artifact from a remote DVC repository using CLI
dvc artifacts get https://github.com/example/project text-classifier --rev v1.2.0 -o ./local_model/
Related Pages