Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Iterative Dvc Api Data

From Leeroopedia


Knowledge Sources
Domains Data_Access, API
Last Updated 2026-02-10 10:00 GMT

Overview

Concrete tool for accessing DVC-tracked data files programmatically provided by the DVC Python API.

Description

The dvc.api.data module provides three public functions (get_url, open, and read) for accessing data files tracked by DVC from Python code. These functions work with both local and remote DVC repositories, resolving file storage locations and streaming data without requiring a full DVC checkout. The module handles exception wrapping to provide user-friendly error messages for missing files or outputs.

Usage

Import these functions when building Python applications that need to read DVC-tracked data files without running DVC CLI commands. Use get_url to obtain the storage URL of a file, open to stream file contents via a context manager, and read to load entire file contents into memory.

Code Reference

Source Location

Signature

def get_url(
    path: str,
    repo: Optional[str] = None,
    rev: Optional[str] = None,
    remote: Optional[str] = None,
    config: Optional[dict[str, Any]] = None,
    remote_config: Optional[dict[str, Any]] = None,
) -> str:
    """Returns the URL to the storage location of a data file or directory."""

def open(
    path: str,
    repo: Optional[str] = None,
    rev: Optional[str] = None,
    remote: Optional[str] = None,
    mode: str = "r",
    encoding: Optional[str] = None,
    config: Optional[dict[str, Any]] = None,
    remote_config: Optional[dict[str, Any]] = None,
) -> _OpenContextManager:
    """Opens a DVC-tracked file for reading."""

def read(
    path,
    repo=None,
    rev=None,
    remote=None,
    mode="r",
    encoding=None,
    config=None,
    remote_config=None,
):
    """Reads and returns the complete contents of a DVC-tracked file."""

Import

from dvc.api import get_url, open, read

I/O Contract

Inputs

Name Type Required Description
path str Yes Path to the data file within the DVC project
repo Optional[str] No URL or local path to the DVC repository
rev Optional[str] No Git revision (branch, tag, commit SHA)
remote Optional[str] No Name of the DVC remote to use
mode str No File open mode ("r" for text, "rb" for binary)
encoding Optional[str] No Text encoding (only for text mode)
config Optional[dict] No DVC configuration overrides
remote_config Optional[dict] No Remote-specific configuration overrides

Outputs

Name Type Description
get_url returns str URL or path to the file in remote storage
open returns _OpenContextManager Context manager yielding a file-like object
read returns Union[str, bytes] File contents as string (text mode) or bytes (binary mode)

Usage Examples

Get Storage URL

import dvc.api

# Get the remote storage URL for a tracked file
url = dvc.api.get_url("data/prepared/train.csv", repo="https://github.com/example/project")
print(url)  # e.g., "s3://mybucket/ab/cd1234..."

Stream File Contents

import dvc.api

# Open a DVC-tracked file from a specific revision
with dvc.api.open("model/metrics.json", rev="v1.0", mode="r") as f:
    import json
    metrics = json.load(f)
    print(metrics["accuracy"])

Read Entire File

import dvc.api

# Read file contents from a remote repository
content = dvc.api.read("data/raw/dataset.csv", repo="https://github.com/example/project")

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment