Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Iterative Dvc Testing Remote Tests

From Leeroopedia
Revision as of 15:20, 16 February 2026 by Admin (talk | contribs) (Auto-imported from implementations/Iterative_Dvc_Testing_Remote_Tests.md)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)


Domains

Testing, Remote_Storage

Overview

Concrete tool providing reusable test suite classes for validating DVC remote storage push, pull, and status operations. The module dvc/testing/remote_tests.py defines three test suite classes -- TestRemote, TestRemoteVersionAware, and TestRemoteWorktree -- that can be inherited by storage backend test modules to verify remote storage behavior.

Description

These test classes use pytest fixtures (tmp_dir, dvc, remote, scm) and DVC's test helpers (tmp_dir.dvc_gen, tmp_dir.scm_add) to exercise the complete push/pull/status lifecycle for different remote storage modes.

TestRemote:

Validates the basic push/pull/status cycle for content-addressed storage. The main test() method proceeds through these phases:

  1. Generate tracked files (single file foo and directory data_dir).
  2. Check initial status -- all hashes should appear as new.
  3. Move cache to backup and verify hashes appear as missing.
  4. Restore cache, push to remote, verify hashes appear as ok.
  5. Clear local cache, verify hashes appear as deleted (present in remote but not local).
  6. Pull from remote, verify file contents and ok status.

Additional test methods:

  • test_stage_cache_push_pull() -- verifies stage cache round-trip (skipped for HTTP remotes).
  • test_pull_00_prefix() -- tests pulling files whose MD5 starts with 00 (edge case for prefix-based traversal).
  • test_pull_no_00_prefix() -- tests pulling files without 00 prefix for comparison.

TestRemoteVersionAware:

Tests version-aware remote storage where each push records a version_id in the DVC metadata:

  • test_file() -- pushes a file, verifies version_id in .dvc metadata, confirms pull restores the file, and ensures idempotent push/reproduce does not alter metadata.
  • test_dir() -- pushes a directory with nested structure, verifies files and version_id fields in metadata, confirms pull/push round-trips, and tests recovery after remote directory deletion.

TestRemoteWorktree:

Tests Git worktree-based remote operations with full lifecycle coverage:

  • test_file() -- push/pull round-trip for a single file with version_id verification.
  • test_dir() -- push/pull round-trip for a nested directory with files and version_id verification.
  • test_deletion() -- verifies that deleting a file from a tracked directory and re-pushing correctly removes it from the remote, while pulling an older revision still restores the deleted file.
  • test_update() -- modifies files on the remote side, runs dvc.update(), and verifies that version IDs change for modified files, new files appear, and unchanged files retain their original version IDs.

Helper function:

def _check_status(status, **kwargs):
    """Assert that a cloud status object matches expected sets for ok/missing/new/deleted."""
    for key in ("ok", "missing", "new", "deleted"):
        expected = kwargs.get(key, set())
        assert expected == set(getattr(status, key))

Signature

class TestRemote:
    def test(self, tmp_dir, dvc, remote): ...
    def test_stage_cache_push_pull(self, tmp_dir, dvc, remote): ...
    def test_pull_00_prefix(self, tmp_dir, dvc, remote, monkeypatch): ...
    def test_pull_no_00_prefix(self, tmp_dir, dvc, remote, monkeypatch): ...


class TestRemoteVersionAware:
    def test_file(self, tmp_dir, dvc, run_copy, remote_version_aware): ...
    def test_dir(self, tmp_dir, dvc, run_copy, remote_version_aware): ...


class TestRemoteWorktree:
    def test_file(self, tmp_dir, dvc, remote_worktree): ...
    def test_dir(self, tmp_dir, dvc, remote_worktree): ...
    def test_deletion(self, tmp_dir, dvc, scm, remote_worktree): ...
    def test_update(self, tmp_dir, dvc, remote_worktree): ...

Import

from dvc.testing.remote_tests import TestRemote, TestRemoteVersionAware, TestRemoteWorktree

Usage Pattern

These classes are designed to be inherited in backend-specific test modules. For example, to test an S3 remote:

import pytest
from dvc.testing.remote_tests import TestRemote, TestRemoteVersionAware

@pytest.fixture
def remote(tmp_dir, dvc, make_remote):
    return make_remote("s3://my-test-bucket/dvc-cache", name="upstream")

class TestS3Remote(TestRemote):
    """Inherits all test methods from TestRemote for S3 backend."""
    pass

class TestS3RemoteVersionAware(TestRemoteVersionAware):
    """Inherits all test methods from TestRemoteVersionAware for S3 backend."""
    pass

Each test method uses the following pytest fixtures:

Fixture Description
tmp_dir Temporary directory with DVC helper methods (dvc_gen, scm_add)
dvc Initialized DVC Repo instance
remote Configured remote storage (used by TestRemote)
remote_version_aware Version-aware remote (used by TestRemoteVersionAware)
remote_worktree Worktree-based remote (used by TestRemoteWorktree)
scm SCM (Git) instance (used by TestRemoteWorktree.test_deletion)
run_copy Helper fixture to create a copy pipeline stage (used by TestRemoteVersionAware)
monkeypatch pytest monkeypatch fixture (used by TestRemote 00-prefix tests)

Related

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment