Implementation:Iterative Dvc Api Data
| Knowledge Sources | |
|---|---|
| Domains | Data_Access, API |
| Last Updated | 2026-02-10 10:00 GMT |
Overview
Concrete tool for accessing DVC-tracked data files programmatically provided by the DVC Python API.
Description
The dvc.api.data module provides three public functions (get_url, open, and read) for accessing data files tracked by DVC from Python code. These functions work with both local and remote DVC repositories, resolving file storage locations and streaming data without requiring a full DVC checkout. The module handles exception wrapping to provide user-friendly error messages for missing files or outputs.
Usage
Import these functions when building Python applications that need to read DVC-tracked data files without running DVC CLI commands. Use get_url to obtain the storage URL of a file, open to stream file contents via a context manager, and read to load entire file contents into memory.
Code Reference
Source Location
- Repository: Iterative_Dvc
- File: dvc/api/data.py
- Lines: 1-330
Signature
def get_url(
path: str,
repo: Optional[str] = None,
rev: Optional[str] = None,
remote: Optional[str] = None,
config: Optional[dict[str, Any]] = None,
remote_config: Optional[dict[str, Any]] = None,
) -> str:
"""Returns the URL to the storage location of a data file or directory."""
def open(
path: str,
repo: Optional[str] = None,
rev: Optional[str] = None,
remote: Optional[str] = None,
mode: str = "r",
encoding: Optional[str] = None,
config: Optional[dict[str, Any]] = None,
remote_config: Optional[dict[str, Any]] = None,
) -> _OpenContextManager:
"""Opens a DVC-tracked file for reading."""
def read(
path,
repo=None,
rev=None,
remote=None,
mode="r",
encoding=None,
config=None,
remote_config=None,
):
"""Reads and returns the complete contents of a DVC-tracked file."""
Import
from dvc.api import get_url, open, read
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| path | str | Yes | Path to the data file within the DVC project |
| repo | Optional[str] | No | URL or local path to the DVC repository |
| rev | Optional[str] | No | Git revision (branch, tag, commit SHA) |
| remote | Optional[str] | No | Name of the DVC remote to use |
| mode | str | No | File open mode ("r" for text, "rb" for binary) |
| encoding | Optional[str] | No | Text encoding (only for text mode) |
| config | Optional[dict] | No | DVC configuration overrides |
| remote_config | Optional[dict] | No | Remote-specific configuration overrides |
Outputs
| Name | Type | Description |
|---|---|---|
| get_url returns | str | URL or path to the file in remote storage |
| open returns | _OpenContextManager | Context manager yielding a file-like object |
| read returns | Union[str, bytes] | File contents as string (text mode) or bytes (binary mode) |
Usage Examples
Get Storage URL
import dvc.api
# Get the remote storage URL for a tracked file
url = dvc.api.get_url("data/prepared/train.csv", repo="https://github.com/example/project")
print(url) # e.g., "s3://mybucket/ab/cd1234..."
Stream File Contents
import dvc.api
# Open a DVC-tracked file from a specific revision
with dvc.api.open("model/metrics.json", rev="v1.0", mode="r") as f:
import json
metrics = json.load(f)
print(metrics["accuracy"])
Read Entire File
import dvc.api
# Read file contents from a remote repository
content = dvc.api.read("data/raw/dataset.csv", repo="https://github.com/example/project")