Implementation:Iterative Dvc Repo Get
| Knowledge Sources | |
|---|---|
| Domains | Data_Access, Remote_Storage |
| Last Updated | 2026-02-10 10:00 GMT |
Overview
The Repo_Get implementation downloads data files or directories from a DVC repository. It resides in dvc/repo/get.py (64 lines) and is the core logic behind the dvc get command.
from dvc.repo.get import get
Function Signature
def get(
url,
path,
out=None,
rev=None,
jobs=None,
force=False,
config=None,
remote=None,
remote_config=None,
):
Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
url |
str | required | URL or path to the source DVC repository |
path |
str | required | Path within the repository to the file or directory to download |
out |
str or None | None |
Local output path; defaults to the basename of path
|
rev |
str or None | None |
Git revision (branch, tag, or commit hash) to download from |
jobs |
int or None | None |
Number of parallel download jobs |
force |
bool | False |
Overwrite the output path if it already exists |
config |
dict, str, or None | None |
Configuration dictionary or path to a config file |
remote |
str or None | None |
Name of the DVC remote to fetch from |
remote_config |
dict or None | None |
Additional remote-specific configuration overrides |
Custom Exception
The module defines a custom exception class for invalid DVC file targets:
class GetDVCFileError(DvcException):
def __init__(self):
super().__init__(
"the given path is a DVC file, you must specify "
"a data file or a directory"
)
This is raised when the resolved output path matches a DVC file name (e.g., .dvc files), which should not be used as download targets.
Internal Mechanics
Output Resolution
The output path is resolved using resolve_output, which derives a local filename from the remote path if out is not provided. It also handles the force flag for overwriting existing files.
DVC File Validation
Before proceeding, the function checks whether the resolved output is a valid DVC filename using is_valid_filename. If it is, a GetDVCFileError is raised to prevent users from accidentally downloading metadata files instead of data.
Configuration Loading
If config is a string (file path), it is loaded as a dictionary using Config.load_file:
from dvc.config import Config
if config and not isinstance(config, dict):
config = Config.load_file(config)
Repository and Filesystem Selection
The function opens the source repository with Repo.open and selects the appropriate filesystem:
- Absolute paths: Uses
DataFileSystembuilt from the local data index. - Relative paths: Uses
repo.dvcfs(the DVC virtual filesystem).
from dvc.fs import download
from dvc.fs.data import DataFileSystem
if os.path.isabs(path):
fs = DataFileSystem(index=repo.index.data["local"])
fs_path = fs.from_os_path(path)
else:
fs = repo.dvcfs
fs_path = fs.from_os_path(path)
download(fs, fs_path, os.path.abspath(out), jobs=jobs)
Download
The actual data transfer is performed by dvc.fs.download, which handles both file and directory downloads with optional parallelism via the jobs parameter.
Usage Example
from dvc.repo.get import get
# Download a file from a remote DVC repository
get("https://github.com/example/repo", "data/train.csv")
# Download to a specific output path with a specific revision
get(
"https://github.com/example/repo",
"models/model.pkl",
out="local_model.pkl",
rev="v2.0",
)
# Download with custom remote configuration
get(
"/path/to/repo",
"data/",
remote="s3remote",
remote_config={"access_key_id": "..."},
jobs=4,
)
Dependencies
| Module | Purpose |
|---|---|
dvc.utils.resolve_output |
Resolves the local output path from the source path |
dvc.dvcfile.is_valid_filename |
Checks whether a filename is a DVC metadata file |
dvc.config.Config |
Loads configuration from file paths |
dvc.repo.Repo |
Opens the source DVC repository |
dvc.fs.download |
Performs the actual file download |
dvc.fs.data.DataFileSystem |
Filesystem for accessing local data index entries |
See Also
- Repo_Imp_Url -- Imports data from a URL into a DVC pipeline
- Repo_Du -- Also uses the
Repo.openpattern for remote access