Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Iterative Dvc Testing Workspace Tests

From Leeroopedia


Knowledge Sources
Domains Testing, Workspace_Operations
Last Updated 2026-02-10 10:00 GMT

Overview

Reusable test suite classes for validating DVC workspace operations including imports, URL listing, and remote-targeted transfers. These classes are designed to be inherited by backend-specific test modules (e.g., S3, GCS, SSH) so that each storage backend can run the same set of workspace operation tests against its own fixtures. The module contains five primary test classes covering file/directory import, version-aware import, URL listing, URL-based get, and remote-targeted add/import workflows.

Source: dvc/testing/workspace_tests.py (401 lines)

Signature

class TestImport:
    def test_import(self, tmp_dir, dvc, workspace): ...
    def test_import_dir(self, tmp_dir, dvc, workspace, stage_md5, dir_md5): ...
    def test_import_empty_dir(self, tmp_dir, dvc, workspace, is_object_storage): ...

class TestImportURLVersionAware:
    def test_import_file(self, tmp_dir, dvc, remote_version_aware): ...
    def test_import_dir(self, tmp_dir, dvc, remote_version_aware): ...
    def test_import_no_download(self, tmp_dir, dvc, remote_version_aware, scm): ...

def match_files(fs, entries, expected): ...

class TestLsUrl:
    def test_file(self, cloud, fname): ...
    def test_dir(self, cloud): ...
    def test_recursive(self, cloud): ...
    def test_nonexistent(self, cloud): ...

class TestGetUrl:
    def test_get_file(self, cloud, tmp_dir): ...
    def test_get_dir(self, cloud, tmp_dir): ...
    def test_get_url_to_dir(self, cloud, tmp_dir, dname): ...
    def test_get_url_nonexistent(self, cloud): ...

class TestToRemote:
    def test_add_to_remote(self, tmp_dir, dvc, remote, workspace): ...
    def test_import_url_to_remote_file(self, tmp_dir, dvc, workspace, remote): ...
    def test_import_url_to_remote_dir(self, tmp_dir, dvc, workspace, remote): ...

Import

from dvc.testing.workspace_tests import TestImport, TestLsUrl, TestGetUrl, TestToRemote

Key Classes

TestImport

Tests basic import of files and directories from a workspace remote using dvc.imp_url("remote://workspace/..."). Includes three test methods:

Method Description
test_import Imports a single file from the workspace remote and verifies content and clean status
test_import_dir Imports a nested directory structure and verifies file contents, directory layout, and optional .dvc file content via stage_md5 / dir_md5 fixtures
test_import_empty_dir Imports an empty directory, handling object storage backends (which use a trailing-slash empty file) vs. filesystem backends

The class provides overridable fixtures (stage_md5, dir_md5, is_object_storage) that default to pytest.skip(), allowing backend-specific test modules to supply concrete values.

TestImportURLVersionAware

Tests version-aware imports that track version IDs (e.g., S3 object versioning). Covers file import, directory import, and no-download import modes. Key behaviors tested:

  • Verifying can_push is False on version-aware outputs
  • Detecting update availability via dvc.status() when the remote file changes
  • Running dvc.update() and confirming the new version is fetched
  • Checking that version_id changes across updates while def_path stays the same
  • Testing no_download=True mode with subsequent dvc.pull() and Git tag-based checkout

TestLsUrl

Tests ls_url() for listing files and directories at external URLs. Uses the cloud fixture and the helper function match_files(fs, entries, expected) for assertion. Tests include:

  • Parameterized file listing: Tests listing of files at paths "foo", "foo.dvc", and "dir/foo"
  • Directory listing: Lists immediate children of a directory
  • Recursive listing: Tests recursive listing with various maxdepth values (0, 1, 2, and unlimited)
  • Nonexistent path: Verifies URLMissingError is raised

TestGetUrl

Tests Repo.get_url() for downloading files and directories from external URLs:

Method Description
test_get_file Downloads a single file and verifies content
test_get_dir Downloads a directory and verifies structure and content
test_get_url_to_dir Parameterized test downloading into existing directories (".", "dir", "dir/subdir")
test_get_url_nonexistent Verifies URLMissingError for nonexistent URLs

TestToRemote

Tests to_remote=True workflows where data is transferred directly to a DVC remote without downloading locally:

Method Description
test_add_to_remote Uses dvc.add(url, to_remote=True) to add a file directly to the remote cache; verifies the .dvc file is created but local file does not exist, and the cached content matches
test_import_url_to_remote_file Uses dvc.imp_url(url, to_remote=True) for a single file; verifies dependency tracking, hash info, and cached content
test_import_url_to_remote_dir Uses dvc.imp_url(url, to_remote=True) for a directory; verifies the .dir manifest in the cache contains correct relpaths and each file part is stored correctly

Helper Function

def match_files(fs, entries, expected):
    """Assert that entries match expected by comparing normalized (path, isdir) tuples."""
    entries_content = {(fs.normpath(d["path"]), d["isdir"]) for d in entries}
    expected_content = {(fs.normpath(d["path"]), d["isdir"]) for d in expected}
    assert entries_content == expected_content

Dependencies

Dependency Usage
pytest Test framework, fixtures, parametrize, skip
funcy.first Retrieve first element from iterables (used in version-aware tests)
dvc.exceptions.URLMissingError Expected exception for nonexistent URLs
dvc.repo.Repo Repo.get_url() static method
dvc.repo.ls_url ls_url() and parse_external_url() functions
dvc.utils.fs.remove File removal utility

Related

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment