Implementation:Iterative Dvc Repo Status
| Knowledge Sources | |
|---|---|
| Domains | Pipeline_Management, Status_Reporting |
| Last Updated | 2026-02-10 10:00 GMT |
Overview
Repo_Status provides functionality for checking the status of DVC-tracked files and pipeline stages, both locally and against remote storage. It is implemented in dvc/repo/status.py (152 lines) and exposes a single public function status().
from dvc.repo.status import status
This module is the backend for the dvc status CLI command.
Public Function
status()
Reports the status of DVC-tracked files and stages. Routes to either local or cloud status checking based on the provided arguments.
Signature:
@locked
def status(
self,
targets=None,
jobs=None,
cloud=False,
remote=None,
all_branches=False,
with_deps=False,
all_tags=False,
all_commits=False,
recursive=False,
check_updates=True,
) -> dict[str, Any]:
Parameters:
| Parameter | Type | Default | Description |
|---|---|---|---|
self |
Repo |
required | The DVC repository instance |
targets |
list[str] or str or None |
None |
Specific targets to check (stage files, paths) |
jobs |
int or None |
None |
Number of parallel jobs for cloud status |
cloud |
bool |
False |
Check status against remote storage |
remote |
str or None |
None |
Specific remote to check against (implies cloud mode) |
all_branches |
bool |
False |
Check across all Git branches (cloud only) |
with_deps |
bool |
False |
Include stage dependencies |
all_tags |
bool |
False |
Check across all Git tags (cloud only) |
all_commits |
bool |
False |
Check across all Git commits (cloud only) |
recursive |
bool |
False |
Recursively collect targets |
check_updates |
bool |
True |
Check for dependency updates (local only) |
Return value: A dictionary mapping stage/file identifiers to their status information.
Exceptions:
dvc.exceptions.InvalidArgumentError-- raised when cloud-only options (--all-branches,--all-tags,--all-commits,--jobs) are used with local status
Internal Functions
_local_status()
Checks the status of local stages and their outputs against the workspace.
def _local_status(self, targets=None, with_deps=False, recursive=False, check_updates=True):
- Defaults
targetsto[None](all stages) if not specified. - Collects stages granularly via
self.stage.collect_granular()for each target. - Chains all collected
(stage, filter_info)pairs and passes them to_joint_status().
_cloud_status()
Checks the status of tracked objects against a remote storage backend.
def _cloud_status(self, targets=None, jobs=None, remote=None, all_branches=False,
with_deps=False, all_tags=False, recursive=False, all_commits=False):
- Uses
self.used_objs()to collect all referenced object IDs across the specified scope (branches, tags, commits). - Calls
self.cloud.status()to compare local cache with remote storage. - Categorizes each object as
"new"(not on remote),"deleted"(not in local cache), or"missing"(neither in cache nor remote). - Skips imported objects (where
odb is not None).
_joint_status()
Aggregates status from multiple stages into a single dictionary.
def _joint_status(pairs, check_updates=True) -> dict:
- Iterates over
(stage, filter_info)pairs. - Logs a warning for frozen stages (except repo imports and versioned imports), since their dependencies will not be checked.
- Calls
stage.status()on each stage and merges the results.
Execution Flow
Local Status
- The
status()function acquires the repo lock via@locked. - If
targetsis a string, it is wrapped into a list. - Since neither
cloudnorremoteis set, local status is selected. - Cloud-only options are validated -- an
InvalidArgumentErroris raised if any are present. _local_status()collects and checks each stage, returning the aggregated result.
Cloud Status
- The
status()function acquires the repo lock via@locked. - Since
cloud=Trueorremoteis set, cloud status is selected. _cloud_status()collects used objects and compares them against the remote.- Results are categorized as
"new","deleted", or"missing".
Key Design Decisions
- Routing pattern: The public
status()function acts as a router, delegating to_local_status()or_cloud_status()based on thecloudandremoteparameters. This keeps the internal implementations focused. - Argument validation: Cloud-only arguments (
--all-branches,--all-tags,--all-commits,--jobs) are explicitly rejected for local status with a helpful error message listing the invalid options. - Frozen stage warnings: Frozen stages are not errors, but a warning is logged since their dependency changes will not be reported. Repo imports and versioned imports are exempted from this warning.
- Import object filtering: In cloud status, objects from imported sources (where
odb is not None) are skipped since they are managed by their source repository.
Dependencies
dvc.repo.locked-- repository lock decoratordvc.exceptions.InvalidArgumentError-- argument validation errorsdvc.log.logger-- logging for frozen stage warningsitertools.chain,itertools.compress-- used for target collection and option filtering