Implementation:Iterative Dvc Checkout
| Knowledge Sources | |
|---|---|
| Domains | Data_Versioning, Workspace_Management |
| Last Updated | 2026-02-10 00:00 GMT |
Overview
Concrete tool for restoring workspace files from the content-addressed cache by computing diffs against recorded metadata and applying file-level changes, provided by the DVC library.
Description
The checkout function is the main entry point for synchronizing the working directory with the data state declared by DVC metafiles (.dvc files and dvc.lock). It builds two data indexes -- one representing the current workspace state on disk, the other representing the expected state from the DVC data index -- computes a diff between them, and applies the resulting additions, modifications, and deletions.
The function lives in dvc/repo/checkout.py and is decorated with @locked to prevent concurrent workspace modifications. It delegates index construction to build_data_index from dvc/repo/index.py, which scans the workspace filesystem and optionally computes hashes for each file. The diff and apply operations use the compare and apply functions from dvc_data.index.checkout.
Checkout also handles cleanup of unused links -- files from previous checkouts that are no longer referenced by any DVC output. When no specific targets are provided (full-workspace checkout), it queries the state database for all known links and removes those that are now orphaned. For targeted checkouts, this cleanup is skipped to avoid accidentally removing files managed by other outputs.
The function tracks failures per output path and collects detailed statistics (counts and path lists for added, modified, and deleted files). If any failures occur and allow_missing is False, a CheckoutError is raised containing both the failure list and the partial results.
Usage
Import checkout when you need to programmatically restore workspace files after fetching data, switching branches, or modifying DVC metafiles. It is called internally by dvc checkout, dvc pull (after fetch), and dvc switch/checkout integrations. Use it directly when building custom workflows that need fine-grained control over workspace restoration.
Code Reference
Source Location
- Repository: DVC
- File:
dvc/repo/checkout.py - Lines: L109-214 (checkout)
- Supporting file:
dvc/repo/index.py - Lines: L837-925 (build_data_index)
Signature
@locked
def checkout(
self,
targets=None,
with_deps=False,
force=False,
relink=False,
recursive=False,
allow_missing=False,
**kwargs,
) -> dict:
Import
from dvc.repo.checkout import checkout
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| self | Repo | Yes | The DVC repository instance (bound via @locked decorator). |
| targets | Optional[list or str] | No | Specific DVC files or output paths to checkout. If None, all tracked outputs are restored and unused links are cleaned up. |
| with_deps | bool | No | If True, include stages that are dependencies of targeted stages. |
| force | bool | No | If True, allow deletion of workspace files that are not present in cache (overrides safety check). |
| relink | bool | No | If True, re-create all file links regardless of whether content has changed. Used when changing the link type. |
| recursive | bool | No | If True, recursively match targets within directories. |
| allow_missing | bool | No | If True, do not raise CheckoutError when some files fail to checkout (e.g., not in cache). |
| **kwargs | dict | No | Additional keyword arguments passed through to the dvc_data apply function. |
Outputs
| Name | Type | Description |
|---|---|---|
| return | dict | A dictionary with keys: modified (list of modified file paths), added (list of added file paths), deleted (list of deleted file paths), stats (dict with integer counts for each change type: modified, added, deleted). Paths are relative to the repository root. Optionally includes a failed key (list of paths) if allow_missing=True and failures occurred. |
Exceptions:
- CheckoutError -- raised when files fail to checkout and allow_missing is False. Contains the failure list and partial results.
- CheckoutErrorSuggestGit -- raised when a target looks like a Git file rather than a DVC file.
- DvcException -- raised when attempting to delete a file that is not cached and force is False.
Usage Examples
Basic Usage
from dvc.repo import Repo
repo = Repo()
# Checkout all tracked files in the workspace
result = repo.checkout()
print(f"Added: {result['stats']['added']}")
print(f"Modified: {result['stats']['modified']}")
print(f"Deleted: {result['stats']['deleted']}")
for path in result["added"]:
print(f" + {path}")
for path in result["modified"]:
print(f" ~ {path}")
for path in result["deleted"]:
print(f" - {path}")
Targeted Checkout with Force
from dvc.repo import Repo
repo = Repo()
# Force checkout of specific targets, allowing missing files
result = repo.checkout(
targets=["data/train.csv", "models/"],
force=True,
recursive=True,
allow_missing=True,
)
if "failed" in result:
print(f"Warning: {len(result['failed'])} files could not be checked out")
Relink After Config Change
from dvc.repo import Repo
repo = Repo()
# After changing cache.type from "hardlink" to "symlink",
# relink all workspace files
result = repo.checkout(relink=True)
print(f"Relinked {result['stats']['modified']} files")