Implementation:Iterative Dvc Repo Diff
| Knowledge Sources | |
|---|---|
| Domains | Data_Management, Version_Control |
| Last Updated | 2026-02-10 10:00 GMT |
Overview
The Repo_Diff implementation compares the state of DVC-tracked files between two revisions or between the workspace and a commit. It resides in dvc/repo/diff.py (161 lines) and is the core logic behind the dvc diff command.
from dvc.repo.diff import diff
Function Signature
@locked
def diff(
self,
a_rev: str = "HEAD",
b_rev: Optional[str] = None,
targets: Optional[list[str]] = None,
recursive: bool = False,
):
Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
self |
Repo |
N/A | The DVC repository instance |
a_rev |
str |
"HEAD" |
The base revision for comparison |
b_rev |
Optional[str] |
None |
The target revision; if None, the current workspace is used
|
targets |
Optional[list[str]] |
None |
Specific paths to restrict the diff to |
recursive |
bool |
False |
Whether to recursively search targets for DVC-tracked files |
Return Value
The function returns a dictionary with the following keys, each mapping to a list of change entries:
| Key | Description |
|---|---|
added |
Files that exist in b_rev but not in a_rev
|
deleted |
Files that exist in a_rev but not in b_rev
|
modified |
Files present in both revisions with different hash values |
renamed |
Files that were renamed between revisions (same hash, different path) |
not in cache |
Files whose data is missing from the local cache (only when comparing against workspace) |
If there are no differences, an empty dictionary is returned.
Internal Mechanics
Helper Functions
The module defines two helper functions used to extract information from index entries:
_path(entry)-- Returns the file path from an index entry, appending a trailing separator for directories._hash(entry)-- Returns the hash value from an index entry, orNoneif unavailable.
Core Diff Logic (_diff)
The _diff function delegates to dvc_data.index.diff.diff (aliased as idiff) and categorizes each change by its type:
from dvc_data.index.diff import ADD, DELETE, MODIFY, RENAME
from dvc_data.index.diff import diff as idiff
Key behaviors:
- Rename detection is enabled via
with_renames=True. - Unknown entries are included to avoid false positives from missing directory entries.
- Unchanged entries are included when
with_missing=Trueto check if data exists in cache.
Revision Branching
The diff function uses self.brancher(revs=[a_rev, b_rev]) to iterate over the specified revisions. For the workspace revision, it calls build_data_index with compute_hash=True to build a fresh data index. For committed revisions, it reads directly from view.data["repo"].
Missing Target Handling
If specific targets are provided and a target is missing from both revisions, a FileNotFoundError is raised. Targets missing from only one revision are handled gracefully as additions or deletions.
Usage Example
from dvc.repo import Repo
with Repo() as repo:
# Compare workspace against HEAD
result = repo.diff()
# Compare two specific tags
result = repo.diff(a_rev="v1.0", b_rev="v2.0")
# Diff specific targets recursively
result = repo.diff(targets=["data/"], recursive=True)
Dependencies
| Module | Purpose |
|---|---|
dvc_data.index.diff |
Provides the low-level index diffing with ADD, DELETE, MODIFY, RENAME change types |
dvc.repo.locked |
Decorator ensuring the repository lock is held during execution |
dvc.ui |
Provides status display during workspace index building and diff calculation |
dvc.repo.index.build_data_index |
Builds a data index for the workspace with computed hashes |