Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Environment:Iterative Dvc Remote Storage Backends

From Leeroopedia


Knowledge Sources
Domains Infrastructure, Cloud_Storage
Last Updated 2026-02-10 10:00 GMT

Overview

Optional remote storage backend packages for pushing/pulling DVC-tracked data to cloud providers (S3, GCS, Azure, HDFS, SSH, etc.).

Description

DVC uses a plugin architecture for remote storage backends. Each cloud provider is implemented as a separate Python package (e.g., `dvc-s3`, `dvc-gs`, `dvc-azure`) that can be installed independently. The remote configuration supports hierarchical settings (system, global, repo, local) with per-backend authentication options including API keys, service accounts, and credential files.

Usage

Use this environment when you need to push, pull, or fetch DVC-tracked data to/from remote storage. Each storage backend requires its own package installation and credential configuration. The `dvc/data_cloud.py` module resolves the configured remote and delegates operations to the appropriate backend.

System Requirements

Category Requirement Notes
Network Internet access Required for cloud storage operations
OS Linux, macOS, or Windows All backends supported on all platforms except HDFS (Linux-preferred)
Python >= 3.9 Same as core DVC requirement

Dependencies

Storage Backend Packages

Each backend is an optional dependency:

  • `dvc-s3` >= 3.2.1, < 4 — Amazon S3 and S3-compatible storage
  • `dvc-gs` >= 3.0.2, < 4 — Google Cloud Storage
  • `dvc-azure` >= 3.1.0, < 4 — Azure Blob Storage
  • `dvc-hdfs` >= 3, < 4 — Hadoop Distributed File System
  • `dvc-ssh` >= 4, < 5 — SSH/SFTP remote
  • `dvc-ssh[gssapi]` >= 4, < 5 — SSH with Kerberos/GSSAPI support
  • `dvc-oss` >= 3, < 4 — Alibaba Cloud Object Storage Service
  • `dvc-gdrive` >= 3, < 4 — Google Drive
  • `dvc-webdav` >= 3.0.1, < 4 — WebDAV protocol
  • `dvc-webhdfs` >= 3.1, < 4 — WebHDFS protocol
  • `dvc-webhdfs[kerberos]` >= 3.1, < 4 — WebHDFS with Kerberos
  • `dvc-http` >= 2.29.0 — HTTP/HTTPS (included in core)

Install All Backends

pip install "dvc[all]"

Credentials

Amazon S3

  • `AWS_ACCESS_KEY_ID`: AWS access key
  • `AWS_SECRET_ACCESS_KEY`: AWS secret key
  • `AWS_SESSION_TOKEN`: Temporary session token (optional)
  • Or configure via `~/.aws/credentials`, `credentialpath`, or IAM roles

Google Cloud Storage

  • `GOOGLE_APPLICATION_CREDENTIALS`: Path to service account JSON file
  • Or configure via `credentialpath` in DVC config

Azure Blob Storage

  • `AZURE_STORAGE_CONNECTION_STRING`: Full connection string
  • Or `AZURE_STORAGE_ACCOUNT` + `AZURE_STORAGE_KEY`
  • Or service principal: `tenant_id`, `client_id`, `client_secret`
  • Or `sas_token` for shared access signatures

SSH

  • SSH key file (default: `~/.ssh/id_rsa`)
  • Or password-based authentication
  • Optional GSSAPI/Kerberos credentials

DVC Studio

  • `DVC_STUDIO_TOKEN`: Authentication token for DVC Studio integration
  • `STUDIO_TOKEN`: Alternative token variable (fallback)
  • `DVC_STUDIO_URL`: Studio instance URL

Quick Install

# Install specific backends
pip install "dvc[s3]"       # Amazon S3
pip install "dvc[gs]"       # Google Cloud Storage
pip install "dvc[azure]"    # Azure Blob Storage
pip install "dvc[ssh]"      # SSH/SFTP
pip install "dvc[hdfs]"     # Hadoop HDFS
pip install "dvc[gdrive]"   # Google Drive
pip install "dvc[webdav]"   # WebDAV
pip install "dvc[oss]"      # Alibaba OSS

# Install all remote backends
pip install "dvc[all]"

Code Evidence

Remote resolution from `dvc/data_cloud.py:81-124`:

def get_remote(
    self, name: Optional[str] = None, command: str = "<command>"
) -> "Remote":
    ...
    if name is None and not self._cloud_config:
        raise NoRemoteError(command)
    name = name or self._cloud_config.get(Config.SECTION_REMOTE_URL)
    ...

Optional dependency extras from `pyproject.toml:81-127`:

[project.optional-dependencies]
all = ["dvc[azure,gdrive,gs,hdfs,oss,s3,ssh,webdav,webhdfs]"]
azure = ["dvc-azure>=3.1.0,<4"]
gdrive = ["dvc-gdrive>=3,<4"]
gs = ["dvc-gs>=3.0.2,<4"]
s3 = ["dvc-s3>=3.2.1,<4"]
ssh = ["dvc-ssh>=4,<5"]

Studio token retrieval from `dvc/utils/studio.py:114-115`:

if DVC_STUDIO_TOKEN in os.environ:
    config["token"] = os.environ[DVC_STUDIO_TOKEN]

Common Errors

Error Message Cause Solution
`NoRemoteError` No remote configured Run `dvc remote add -d myremote <url>`
`RemoteMissingDepsError` Backend package not installed `pip install "dvc[s3]"` (or appropriate backend)
`AuthenticationError` Invalid or missing credentials Check credential environment variables or config
`URLMissingError` Remote URL not set Set URL in `.dvc/config` via `dvc remote modify`

Compatibility Notes

  • S3-compatible storage: Works with MinIO, DigitalOcean Spaces, and other S3-compatible APIs via the `endpointurl` config option.
  • Version-aware remotes: S3 and GCS support versioned object storage with the `version_aware` config flag for worktree remotes.
  • HTTP remotes: The `dvc-http` package is included in core DVC and supports basic, digest, and custom authentication headers.
  • HDFS: Best supported on Linux. Requires Java runtime for native HDFS access.
  • WebDAV/WebHDFS: Support Kerberos authentication as optional extras.

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment