Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Iterative Dvc Repo Status

From Leeroopedia


Knowledge Sources
Domains Pipeline_Management, Status_Reporting
Last Updated 2026-02-10 10:00 GMT

Overview

Repo_Status provides functionality for checking the status of DVC-tracked files and pipeline stages, both locally and against remote storage. It is implemented in dvc/repo/status.py (152 lines) and exposes a single public function status().

from dvc.repo.status import status

This module is the backend for the dvc status CLI command.

Public Function

status()

Reports the status of DVC-tracked files and stages. Routes to either local or cloud status checking based on the provided arguments.

Signature:

@locked
def status(
    self,
    targets=None,
    jobs=None,
    cloud=False,
    remote=None,
    all_branches=False,
    with_deps=False,
    all_tags=False,
    all_commits=False,
    recursive=False,
    check_updates=True,
) -> dict[str, Any]:

Parameters:

Parameter Type Default Description
self Repo required The DVC repository instance
targets list[str] or str or None None Specific targets to check (stage files, paths)
jobs int or None None Number of parallel jobs for cloud status
cloud bool False Check status against remote storage
remote str or None None Specific remote to check against (implies cloud mode)
all_branches bool False Check across all Git branches (cloud only)
with_deps bool False Include stage dependencies
all_tags bool False Check across all Git tags (cloud only)
all_commits bool False Check across all Git commits (cloud only)
recursive bool False Recursively collect targets
check_updates bool True Check for dependency updates (local only)

Return value: A dictionary mapping stage/file identifiers to their status information.

Exceptions:

  • dvc.exceptions.InvalidArgumentError -- raised when cloud-only options (--all-branches, --all-tags, --all-commits, --jobs) are used with local status

Internal Functions

_local_status()

Checks the status of local stages and their outputs against the workspace.

def _local_status(self, targets=None, with_deps=False, recursive=False, check_updates=True):
  • Defaults targets to [None] (all stages) if not specified.
  • Collects stages granularly via self.stage.collect_granular() for each target.
  • Chains all collected (stage, filter_info) pairs and passes them to _joint_status().

_cloud_status()

Checks the status of tracked objects against a remote storage backend.

def _cloud_status(self, targets=None, jobs=None, remote=None, all_branches=False,
                  with_deps=False, all_tags=False, recursive=False, all_commits=False):
  • Uses self.used_objs() to collect all referenced object IDs across the specified scope (branches, tags, commits).
  • Calls self.cloud.status() to compare local cache with remote storage.
  • Categorizes each object as "new" (not on remote), "deleted" (not in local cache), or "missing" (neither in cache nor remote).
  • Skips imported objects (where odb is not None).

_joint_status()

Aggregates status from multiple stages into a single dictionary.

def _joint_status(pairs, check_updates=True) -> dict:
  • Iterates over (stage, filter_info) pairs.
  • Logs a warning for frozen stages (except repo imports and versioned imports), since their dependencies will not be checked.
  • Calls stage.status() on each stage and merges the results.

Execution Flow

Local Status

  1. The status() function acquires the repo lock via @locked.
  2. If targets is a string, it is wrapped into a list.
  3. Since neither cloud nor remote is set, local status is selected.
  4. Cloud-only options are validated -- an InvalidArgumentError is raised if any are present.
  5. _local_status() collects and checks each stage, returning the aggregated result.

Cloud Status

  1. The status() function acquires the repo lock via @locked.
  2. Since cloud=True or remote is set, cloud status is selected.
  3. _cloud_status() collects used objects and compares them against the remote.
  4. Results are categorized as "new", "deleted", or "missing".

Key Design Decisions

  • Routing pattern: The public status() function acts as a router, delegating to _local_status() or _cloud_status() based on the cloud and remote parameters. This keeps the internal implementations focused.
  • Argument validation: Cloud-only arguments (--all-branches, --all-tags, --all-commits, --jobs) are explicitly rejected for local status with a helpful error message listing the invalid options.
  • Frozen stage warnings: Frozen stages are not errors, but a warning is logged since their dependency changes will not be reported. Repo imports and versioned imports are exempted from this warning.
  • Import object filtering: In cloud status, objects from imported sources (where odb is not None) are skipped since they are managed by their source repository.

Dependencies

  • dvc.repo.locked -- repository lock decorator
  • dvc.exceptions.InvalidArgumentError -- argument validation errors
  • dvc.log.logger -- logging for frozen stage warnings
  • itertools.chain, itertools.compress -- used for target collection and option filtering

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment