Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:Iterative Dvc Checkout

From Leeroopedia


Knowledge Sources
Domains Data_Versioning, Workspace_Management
Last Updated 2026-02-10 00:00 GMT

Overview

Concrete tool for restoring workspace files from the content-addressed cache by computing diffs against recorded metadata and applying file-level changes, provided by the DVC library.

Description

The checkout function is the main entry point for synchronizing the working directory with the data state declared by DVC metafiles (.dvc files and dvc.lock). It builds two data indexes -- one representing the current workspace state on disk, the other representing the expected state from the DVC data index -- computes a diff between them, and applies the resulting additions, modifications, and deletions.

The function lives in dvc/repo/checkout.py and is decorated with @locked to prevent concurrent workspace modifications. It delegates index construction to build_data_index from dvc/repo/index.py, which scans the workspace filesystem and optionally computes hashes for each file. The diff and apply operations use the compare and apply functions from dvc_data.index.checkout.

Checkout also handles cleanup of unused links -- files from previous checkouts that are no longer referenced by any DVC output. When no specific targets are provided (full-workspace checkout), it queries the state database for all known links and removes those that are now orphaned. For targeted checkouts, this cleanup is skipped to avoid accidentally removing files managed by other outputs.

The function tracks failures per output path and collects detailed statistics (counts and path lists for added, modified, and deleted files). If any failures occur and allow_missing is False, a CheckoutError is raised containing both the failure list and the partial results.

Usage

Import checkout when you need to programmatically restore workspace files after fetching data, switching branches, or modifying DVC metafiles. It is called internally by dvc checkout, dvc pull (after fetch), and dvc switch/checkout integrations. Use it directly when building custom workflows that need fine-grained control over workspace restoration.

Code Reference

Source Location

  • Repository: DVC
  • File: dvc/repo/checkout.py
  • Lines: L109-214 (checkout)
  • Supporting file: dvc/repo/index.py
  • Lines: L837-925 (build_data_index)

Signature

@locked
def checkout(
    self,
    targets=None,
    with_deps=False,
    force=False,
    relink=False,
    recursive=False,
    allow_missing=False,
    **kwargs,
) -> dict:

Import

from dvc.repo.checkout import checkout

I/O Contract

Inputs

Name Type Required Description
self Repo Yes The DVC repository instance (bound via @locked decorator).
targets Optional[list or str] No Specific DVC files or output paths to checkout. If None, all tracked outputs are restored and unused links are cleaned up.
with_deps bool No If True, include stages that are dependencies of targeted stages.
force bool No If True, allow deletion of workspace files that are not present in cache (overrides safety check).
relink bool No If True, re-create all file links regardless of whether content has changed. Used when changing the link type.
recursive bool No If True, recursively match targets within directories.
allow_missing bool No If True, do not raise CheckoutError when some files fail to checkout (e.g., not in cache).
**kwargs dict No Additional keyword arguments passed through to the dvc_data apply function.

Outputs

Name Type Description
return dict A dictionary with keys: modified (list of modified file paths), added (list of added file paths), deleted (list of deleted file paths), stats (dict with integer counts for each change type: modified, added, deleted). Paths are relative to the repository root. Optionally includes a failed key (list of paths) if allow_missing=True and failures occurred.

Exceptions:

  • CheckoutError -- raised when files fail to checkout and allow_missing is False. Contains the failure list and partial results.
  • CheckoutErrorSuggestGit -- raised when a target looks like a Git file rather than a DVC file.
  • DvcException -- raised when attempting to delete a file that is not cached and force is False.

Usage Examples

Basic Usage

from dvc.repo import Repo

repo = Repo()

# Checkout all tracked files in the workspace
result = repo.checkout()
print(f"Added: {result['stats']['added']}")
print(f"Modified: {result['stats']['modified']}")
print(f"Deleted: {result['stats']['deleted']}")

for path in result["added"]:
    print(f"  + {path}")
for path in result["modified"]:
    print(f"  ~ {path}")
for path in result["deleted"]:
    print(f"  - {path}")

Targeted Checkout with Force

from dvc.repo import Repo

repo = Repo()

# Force checkout of specific targets, allowing missing files
result = repo.checkout(
    targets=["data/train.csv", "models/"],
    force=True,
    recursive=True,
    allow_missing=True,
)

if "failed" in result:
    print(f"Warning: {len(result['failed'])} files could not be checked out")

Relink After Config Change

from dvc.repo import Repo

repo = Repo()

# After changing cache.type from "hardlink" to "symlink",
# relink all workspace files
result = repo.checkout(relink=True)
print(f"Relinked {result['stats']['modified']} files")

Related Pages

Implements Principle

Requires Environment

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment