Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:Iterative Dvc Push Fetch

From Leeroopedia


Knowledge Sources
Domains Data_Synchronization, Distributed_Storage
Last Updated 2026-02-10 00:00 GMT

Overview

Concrete tool for executing parallel data transfers between local cache and remote storage backends during push and fetch operations, provided by the DVC library.

Description

The push and fetch functions are the top-level entry points for uploading data to and downloading data from remote storage in DVC. Both functions follow the same architectural pattern: collect filtered data indexes across revisions, gather transferable entries using dvc-data's collect mechanism, dispatch parallel transfers via dvc-data's ipush/ifetch, and handle post-transfer bookkeeping.

The push function (in dvc/repo/push.py) additionally validates that worktree/version-aware remotes do not receive multi-revision pushes, manages the run cache, calls _update_meta to merge remote version IDs back into output metadata after pushing, and supports glob-based target patterns.

The fetch function (in dvc/repo/fetch.py) includes special handling for version-aware remotes by filtering out entries that lack cloud version information (the _log_unversioned helper), and also manages run cache fetching.

Both functions are decorated with @locked, ensuring that only one push or fetch operation runs at a time within a repository. Both also interact with the DataCloud class (in dvc/data_cloud.py) which provides lower-level push/pull/status methods operating on hash-based object databases.

Usage

Import push or fetch when building custom data synchronization workflows, or use the CLI commands dvc push and dvc fetch which call these functions. The functions support targeting specific files, filtering by revision, and controlling parallelism via the jobs parameter.

Code Reference

Source Location

  • Repository: DVC
  • File: dvc/repo/push.py
  • Lines: L63-178 (push)
  • File: dvc/repo/fetch.py
  • Lines: L100-207 (fetch)
  • Supporting file: dvc/data_cloud.py
  • Lines: L168-258 (DataCloud.push, DataCloud.pull)

Signature

# Push
@locked
def push(
    self,
    targets=None,
    jobs=None,
    remote=None,
    all_branches=False,
    with_deps=False,
    all_tags=False,
    recursive=False,
    all_commits=False,
    run_cache=False,
    revs=None,
    workspace=True,
    glob=False,
) -> int:

# Fetch
@locked
def fetch(
    self: "Repo",
    targets=None,
    jobs=None,
    remote=None,
    all_branches=False,
    with_deps=False,
    all_tags=False,
    recursive=False,
    all_commits=False,
    run_cache=False,
    revs=None,
    workspace=True,
    max_size=None,
    types=None,
    config=None,
    onerror=None,
) -> int:

Import

from dvc.repo.push import push
from dvc.repo.fetch import fetch

I/O Contract

Inputs

Name Type Required Description
self Repo Yes The DVC repository instance (bound via @locked decorator).
targets Optional[list] No Specific DVC-tracked files or directories to push/fetch. Supports glob patterns when glob=True (push only).
jobs Optional[int] No Number of parallel transfer threads. Defaults to the config-level setting.
remote Optional[str] No Name of the remote to push to or fetch from. Uses default remote if None.
all_branches bool No If True, include data from all branches.
all_tags bool No If True, include data from all tags.
all_commits bool No If True, include data from all commits.
revs Optional[list] No Specific revisions to include.
with_deps bool No If True, include dependencies of targeted stages.
recursive bool No If True, recursively match targets within directories.
run_cache bool No If True, also push/fetch the run cache (stage cache).
workspace bool No If True (default), include the current workspace.
glob bool No (Push only) If True, treat targets as glob patterns.
max_size Optional[int] No (Fetch only) Maximum file size in bytes to fetch.
types Optional[list[str]] No (Fetch only) Restrict to output types: "metrics", "plots", "params".
config Optional[dict] No (Fetch only) Additional configuration overrides.
onerror Optional[Callable] No (Fetch only) Error callback for collection failures.

Outputs

Name Type Description
return int Count of successfully transferred files. This includes both data files and run cache entries (if run_cache=True).

Exceptions:

  • UploadError -- raised by push when one or more files fail to upload. The exception's amount attribute indicates the failure count.
  • DownloadError -- raised by fetch when one or more files fail to download.
  • InvalidArgumentError -- raised by push when attempting multi-revision push to a version-aware remote.
  • NoRemoteError -- raised when no remote is configured and none is specified.

Usage Examples

Basic Usage

from dvc.repo import Repo

repo = Repo()

# Push all tracked data to the default remote
transferred = repo.push()
print(f"Pushed {transferred} files")

# Fetch specific targets from a named remote with 8 threads
transferred = repo.fetch(
    targets=["data/train.csv", "models/"],
    remote="s3remote",
    jobs=8,
)
print(f"Fetched {transferred} files")

Multi-Revision Push

from dvc.repo import Repo

repo = Repo()

# Push data from all branches and tags
transferred = repo.push(
    all_branches=True,
    all_tags=True,
    run_cache=True,
)
print(f"Pushed {transferred} files across all revisions")

Fetch with Type Filtering

from dvc.repo import Repo

repo = Repo()

# Fetch only metrics and plots, skip large files
transferred = repo.fetch(
    types=["metrics", "plots"],
    max_size=100_000_000,  # 100 MB limit
)
print(f"Fetched {transferred} metric/plot files")

Related Pages

Implements Principle

Requires Environment

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment