Implementation:Iterative Dvc Push Fetch
| Knowledge Sources | |
|---|---|
| Domains | Data_Synchronization, Distributed_Storage |
| Last Updated | 2026-02-10 00:00 GMT |
Overview
Concrete tool for executing parallel data transfers between local cache and remote storage backends during push and fetch operations, provided by the DVC library.
Description
The push and fetch functions are the top-level entry points for uploading data to and downloading data from remote storage in DVC. Both functions follow the same architectural pattern: collect filtered data indexes across revisions, gather transferable entries using dvc-data's collect mechanism, dispatch parallel transfers via dvc-data's ipush/ifetch, and handle post-transfer bookkeeping.
The push function (in dvc/repo/push.py) additionally validates that worktree/version-aware remotes do not receive multi-revision pushes, manages the run cache, calls _update_meta to merge remote version IDs back into output metadata after pushing, and supports glob-based target patterns.
The fetch function (in dvc/repo/fetch.py) includes special handling for version-aware remotes by filtering out entries that lack cloud version information (the _log_unversioned helper), and also manages run cache fetching.
Both functions are decorated with @locked, ensuring that only one push or fetch operation runs at a time within a repository. Both also interact with the DataCloud class (in dvc/data_cloud.py) which provides lower-level push/pull/status methods operating on hash-based object databases.
Usage
Import push or fetch when building custom data synchronization workflows, or use the CLI commands dvc push and dvc fetch which call these functions. The functions support targeting specific files, filtering by revision, and controlling parallelism via the jobs parameter.
Code Reference
Source Location
- Repository: DVC
- File:
dvc/repo/push.py - Lines: L63-178 (push)
- File:
dvc/repo/fetch.py - Lines: L100-207 (fetch)
- Supporting file:
dvc/data_cloud.py - Lines: L168-258 (DataCloud.push, DataCloud.pull)
Signature
# Push
@locked
def push(
self,
targets=None,
jobs=None,
remote=None,
all_branches=False,
with_deps=False,
all_tags=False,
recursive=False,
all_commits=False,
run_cache=False,
revs=None,
workspace=True,
glob=False,
) -> int:
# Fetch
@locked
def fetch(
self: "Repo",
targets=None,
jobs=None,
remote=None,
all_branches=False,
with_deps=False,
all_tags=False,
recursive=False,
all_commits=False,
run_cache=False,
revs=None,
workspace=True,
max_size=None,
types=None,
config=None,
onerror=None,
) -> int:
Import
from dvc.repo.push import push
from dvc.repo.fetch import fetch
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| self | Repo | Yes | The DVC repository instance (bound via @locked decorator). |
| targets | Optional[list] | No | Specific DVC-tracked files or directories to push/fetch. Supports glob patterns when glob=True (push only). |
| jobs | Optional[int] | No | Number of parallel transfer threads. Defaults to the config-level setting. |
| remote | Optional[str] | No | Name of the remote to push to or fetch from. Uses default remote if None. |
| all_branches | bool | No | If True, include data from all branches. |
| all_tags | bool | No | If True, include data from all tags. |
| all_commits | bool | No | If True, include data from all commits. |
| revs | Optional[list] | No | Specific revisions to include. |
| with_deps | bool | No | If True, include dependencies of targeted stages. |
| recursive | bool | No | If True, recursively match targets within directories. |
| run_cache | bool | No | If True, also push/fetch the run cache (stage cache). |
| workspace | bool | No | If True (default), include the current workspace. |
| glob | bool | No | (Push only) If True, treat targets as glob patterns. |
| max_size | Optional[int] | No | (Fetch only) Maximum file size in bytes to fetch. |
| types | Optional[list[str]] | No | (Fetch only) Restrict to output types: "metrics", "plots", "params". |
| config | Optional[dict] | No | (Fetch only) Additional configuration overrides. |
| onerror | Optional[Callable] | No | (Fetch only) Error callback for collection failures. |
Outputs
| Name | Type | Description |
|---|---|---|
| return | int | Count of successfully transferred files. This includes both data files and run cache entries (if run_cache=True). |
Exceptions:
- UploadError -- raised by push when one or more files fail to upload. The exception's amount attribute indicates the failure count.
- DownloadError -- raised by fetch when one or more files fail to download.
- InvalidArgumentError -- raised by push when attempting multi-revision push to a version-aware remote.
- NoRemoteError -- raised when no remote is configured and none is specified.
Usage Examples
Basic Usage
from dvc.repo import Repo
repo = Repo()
# Push all tracked data to the default remote
transferred = repo.push()
print(f"Pushed {transferred} files")
# Fetch specific targets from a named remote with 8 threads
transferred = repo.fetch(
targets=["data/train.csv", "models/"],
remote="s3remote",
jobs=8,
)
print(f"Fetched {transferred} files")
Multi-Revision Push
from dvc.repo import Repo
repo = Repo()
# Push data from all branches and tags
transferred = repo.push(
all_branches=True,
all_tags=True,
run_cache=True,
)
print(f"Pushed {transferred} files across all revisions")
Fetch with Type Filtering
from dvc.repo import Repo
repo = Repo()
# Fetch only metrics and plots, skip large files
transferred = repo.fetch(
types=["metrics", "plots"],
max_size=100_000_000, # 100 MB limit
)
print(f"Fetched {transferred} metric/plot files")