Environment:Iterative Dvc Remote Storage Backends
| Knowledge Sources | |
|---|---|
| Domains | Infrastructure, Cloud_Storage |
| Last Updated | 2026-02-10 10:00 GMT |
Overview
Optional remote storage backend packages for pushing/pulling DVC-tracked data to cloud providers (S3, GCS, Azure, HDFS, SSH, etc.).
Description
DVC uses a plugin architecture for remote storage backends. Each cloud provider is implemented as a separate Python package (e.g., `dvc-s3`, `dvc-gs`, `dvc-azure`) that can be installed independently. The remote configuration supports hierarchical settings (system, global, repo, local) with per-backend authentication options including API keys, service accounts, and credential files.
Usage
Use this environment when you need to push, pull, or fetch DVC-tracked data to/from remote storage. Each storage backend requires its own package installation and credential configuration. The `dvc/data_cloud.py` module resolves the configured remote and delegates operations to the appropriate backend.
System Requirements
| Category | Requirement | Notes |
|---|---|---|
| Network | Internet access | Required for cloud storage operations |
| OS | Linux, macOS, or Windows | All backends supported on all platforms except HDFS (Linux-preferred) |
| Python | >= 3.9 | Same as core DVC requirement |
Dependencies
Storage Backend Packages
Each backend is an optional dependency:
- `dvc-s3` >= 3.2.1, < 4 — Amazon S3 and S3-compatible storage
- `dvc-gs` >= 3.0.2, < 4 — Google Cloud Storage
- `dvc-azure` >= 3.1.0, < 4 — Azure Blob Storage
- `dvc-hdfs` >= 3, < 4 — Hadoop Distributed File System
- `dvc-ssh` >= 4, < 5 — SSH/SFTP remote
- `dvc-ssh[gssapi]` >= 4, < 5 — SSH with Kerberos/GSSAPI support
- `dvc-oss` >= 3, < 4 — Alibaba Cloud Object Storage Service
- `dvc-gdrive` >= 3, < 4 — Google Drive
- `dvc-webdav` >= 3.0.1, < 4 — WebDAV protocol
- `dvc-webhdfs` >= 3.1, < 4 — WebHDFS protocol
- `dvc-webhdfs[kerberos]` >= 3.1, < 4 — WebHDFS with Kerberos
- `dvc-http` >= 2.29.0 — HTTP/HTTPS (included in core)
Install All Backends
pip install "dvc[all]"
Credentials
Amazon S3
- `AWS_ACCESS_KEY_ID`: AWS access key
- `AWS_SECRET_ACCESS_KEY`: AWS secret key
- `AWS_SESSION_TOKEN`: Temporary session token (optional)
- Or configure via `~/.aws/credentials`, `credentialpath`, or IAM roles
Google Cloud Storage
- `GOOGLE_APPLICATION_CREDENTIALS`: Path to service account JSON file
- Or configure via `credentialpath` in DVC config
Azure Blob Storage
- `AZURE_STORAGE_CONNECTION_STRING`: Full connection string
- Or `AZURE_STORAGE_ACCOUNT` + `AZURE_STORAGE_KEY`
- Or service principal: `tenant_id`, `client_id`, `client_secret`
- Or `sas_token` for shared access signatures
SSH
- SSH key file (default: `~/.ssh/id_rsa`)
- Or password-based authentication
- Optional GSSAPI/Kerberos credentials
DVC Studio
- `DVC_STUDIO_TOKEN`: Authentication token for DVC Studio integration
- `STUDIO_TOKEN`: Alternative token variable (fallback)
- `DVC_STUDIO_URL`: Studio instance URL
Quick Install
# Install specific backends
pip install "dvc[s3]" # Amazon S3
pip install "dvc[gs]" # Google Cloud Storage
pip install "dvc[azure]" # Azure Blob Storage
pip install "dvc[ssh]" # SSH/SFTP
pip install "dvc[hdfs]" # Hadoop HDFS
pip install "dvc[gdrive]" # Google Drive
pip install "dvc[webdav]" # WebDAV
pip install "dvc[oss]" # Alibaba OSS
# Install all remote backends
pip install "dvc[all]"
Code Evidence
Remote resolution from `dvc/data_cloud.py:81-124`:
def get_remote(
self, name: Optional[str] = None, command: str = "<command>"
) -> "Remote":
...
if name is None and not self._cloud_config:
raise NoRemoteError(command)
name = name or self._cloud_config.get(Config.SECTION_REMOTE_URL)
...
Optional dependency extras from `pyproject.toml:81-127`:
[project.optional-dependencies]
all = ["dvc[azure,gdrive,gs,hdfs,oss,s3,ssh,webdav,webhdfs]"]
azure = ["dvc-azure>=3.1.0,<4"]
gdrive = ["dvc-gdrive>=3,<4"]
gs = ["dvc-gs>=3.0.2,<4"]
s3 = ["dvc-s3>=3.2.1,<4"]
ssh = ["dvc-ssh>=4,<5"]
Studio token retrieval from `dvc/utils/studio.py:114-115`:
if DVC_STUDIO_TOKEN in os.environ:
config["token"] = os.environ[DVC_STUDIO_TOKEN]
Common Errors
| Error Message | Cause | Solution |
|---|---|---|
| `NoRemoteError` | No remote configured | Run `dvc remote add -d myremote <url>` |
| `RemoteMissingDepsError` | Backend package not installed | `pip install "dvc[s3]"` (or appropriate backend) |
| `AuthenticationError` | Invalid or missing credentials | Check credential environment variables or config |
| `URLMissingError` | Remote URL not set | Set URL in `.dvc/config` via `dvc remote modify` |
Compatibility Notes
- S3-compatible storage: Works with MinIO, DigitalOcean Spaces, and other S3-compatible APIs via the `endpointurl` config option.
- Version-aware remotes: S3 and GCS support versioned object storage with the `version_aware` config flag for worktree remotes.
- HTTP remotes: The `dvc-http` package is included in core DVC and supports basic, digest, and custom authentication headers.
- HDFS: Best supported on Linux. Requires Java runtime for native HDFS access.
- WebDAV/WebHDFS: Support Kerberos authentication as optional extras.