Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Iterative Dvc Repo Get

From Leeroopedia


Knowledge Sources
Domains Data_Access, Remote_Storage
Last Updated 2026-02-10 10:00 GMT

Overview

The Repo_Get implementation downloads data files or directories from a DVC repository. It resides in dvc/repo/get.py (64 lines) and is the core logic behind the dvc get command.

from dvc.repo.get import get

Function Signature

def get(
    url,
    path,
    out=None,
    rev=None,
    jobs=None,
    force=False,
    config=None,
    remote=None,
    remote_config=None,
):

Parameters

Parameter Type Default Description
url str required URL or path to the source DVC repository
path str required Path within the repository to the file or directory to download
out str or None None Local output path; defaults to the basename of path
rev str or None None Git revision (branch, tag, or commit hash) to download from
jobs int or None None Number of parallel download jobs
force bool False Overwrite the output path if it already exists
config dict, str, or None None Configuration dictionary or path to a config file
remote str or None None Name of the DVC remote to fetch from
remote_config dict or None None Additional remote-specific configuration overrides

Custom Exception

The module defines a custom exception class for invalid DVC file targets:

class GetDVCFileError(DvcException):
    def __init__(self):
        super().__init__(
            "the given path is a DVC file, you must specify "
            "a data file or a directory"
        )

This is raised when the resolved output path matches a DVC file name (e.g., .dvc files), which should not be used as download targets.

Internal Mechanics

Output Resolution

The output path is resolved using resolve_output, which derives a local filename from the remote path if out is not provided. It also handles the force flag for overwriting existing files.

DVC File Validation

Before proceeding, the function checks whether the resolved output is a valid DVC filename using is_valid_filename. If it is, a GetDVCFileError is raised to prevent users from accidentally downloading metadata files instead of data.

Configuration Loading

If config is a string (file path), it is loaded as a dictionary using Config.load_file:

from dvc.config import Config

if config and not isinstance(config, dict):
    config = Config.load_file(config)

Repository and Filesystem Selection

The function opens the source repository with Repo.open and selects the appropriate filesystem:

  • Absolute paths: Uses DataFileSystem built from the local data index.
  • Relative paths: Uses repo.dvcfs (the DVC virtual filesystem).
from dvc.fs import download
from dvc.fs.data import DataFileSystem

if os.path.isabs(path):
    fs = DataFileSystem(index=repo.index.data["local"])
    fs_path = fs.from_os_path(path)
else:
    fs = repo.dvcfs
    fs_path = fs.from_os_path(path)
download(fs, fs_path, os.path.abspath(out), jobs=jobs)

Download

The actual data transfer is performed by dvc.fs.download, which handles both file and directory downloads with optional parallelism via the jobs parameter.

Usage Example

from dvc.repo.get import get

# Download a file from a remote DVC repository
get("https://github.com/example/repo", "data/train.csv")

# Download to a specific output path with a specific revision
get(
    "https://github.com/example/repo",
    "models/model.pkl",
    out="local_model.pkl",
    rev="v2.0",
)

# Download with custom remote configuration
get(
    "/path/to/repo",
    "data/",
    remote="s3remote",
    remote_config={"access_key_id": "..."},
    jobs=4,
)

Dependencies

Module Purpose
dvc.utils.resolve_output Resolves the local output path from the source path
dvc.dvcfile.is_valid_filename Checks whether a filename is a DVC metadata file
dvc.config.Config Loads configuration from file paths
dvc.repo.Repo Opens the source DVC repository
dvc.fs.download Performs the actual file download
dvc.fs.data.DataFileSystem Filesystem for accessing local data index entries

See Also

  • Repo_Imp_Url -- Imports data from a URL into a DVC pipeline
  • Repo_Du -- Also uses the Repo.open pattern for remote access

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment