Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:Iterative Dvc Update Meta

From Leeroopedia


Knowledge Sources
Domains Data_Synchronization, Index_Management
Last Updated 2026-02-10 00:00 GMT

Overview

Concrete tool for merging remote storage metadata (version IDs, checksums) back into local DVC output entries after push operations, provided by the DVC library.

Description

The _update_meta function orchestrates the post-push metadata reconciliation process. After a push to a version-aware remote, it iterates over outputs grouped by remote, rebuilds the remote data index to capture the cloud-assigned metadata (version IDs, ETags), and delegates per-output merging to _merge_push_meta. Finally, it dumps the updated metadata back to the DVC stage files (.dvc or dvc.lock).

_update_meta lives in dvc/repo/push.py and is called in the finally block of the push function, ensuring metadata is updated even if the push partially fails. The companion _merge_push_meta function in dvc/repo/worktree.py handles the detailed per-output merge logic, including special handling for directory outputs where individual file version IDs must be reconciled with the existing tree object.

The _rebuild helper (also in push.py, L13-34) is responsible for querying the remote filesystem's current metadata (via fs.info) for every entry in the data index, producing an updated DataIndex with fresh Meta objects that carry the cloud-assigned version IDs.

Together, these functions ensure that after a push, the local DVC metafiles contain the exact version information needed for any future pull from the same remote.

Usage

The _update_meta function is an internal implementation detail called automatically by the push command. It should not typically be imported directly. However, understanding its behavior is essential when debugging version-aware remote workflows, diagnosing version ID mismatches, or extending DVC's push pipeline. _merge_push_meta can be used directly in custom worktree synchronization code.

Code Reference

Source Location

  • Repository: DVC
  • File: dvc/repo/push.py
  • Lines: L37-60 (_update_meta)
  • Supporting file: dvc/repo/worktree.py
  • Lines: L100-158 (_merge_push_meta)

Signature

# _update_meta
def _update_meta(index, **kwargs) -> None:

# _merge_push_meta
def _merge_push_meta(
    out: "Output",
    index: Union["DataIndex", "DataIndexView"],
    remote: Optional[str] = None,
) -> None:

Import

from dvc.repo.push import _update_meta
from dvc.repo.worktree import _merge_push_meta

I/O Contract

Inputs

Name Type Required Description
index Index Yes The full repository data index (for _update_meta). Used to generate worktree views grouped by remote.
**kwargs dict No Additional keyword arguments passed to worktree_view_by_remotes, including targets, with_deps, and recursive for filtering which outputs to update.
out Output Yes (For _merge_push_meta) The DVC output whose metadata should be updated with remote version information.
index Union[DataIndex, DataIndexView] Yes (For _merge_push_meta) The rebuilt remote data index containing fresh metadata with cloud-assigned version IDs.
remote Optional[str] No (For _merge_push_meta) The remote name to tag on the output's metadata, enabling multi-remote tracking.

Outputs

Name Type Description
return (_update_meta) None The function modifies outputs and their stage files in place. Affected stages are dumped with updated metadata (with_files=True, update_pipeline=False).
return (_merge_push_meta) None The function modifies the Output object in place, updating its hash_info, meta (including version_id and remote), and obj (tree object for directories).

Side effects:

  • Output objects are mutated: hash_info, meta, and obj fields are updated.
  • Stage files (.dvc or dvc.lock) are rewritten to disk via stage.dump().
  • For directory outputs, a new Tree object is built and assigned to out.obj.

Usage Examples

Basic Usage (Internal Call from Push)

# This is how _update_meta is called internally by the push function.
# Users do not typically call this directly.

from dvc.repo.push import _update_meta
from dvc.repo.index import IndexView

# After a successful push operation:
ws_idx = indexes.get("workspace")
if ws_idx is not None:
    _index = ws_idx.index if isinstance(ws_idx, IndexView) else ws_idx
    _update_meta(
        _index,
        targets=targets,
        with_deps=with_deps,
        recursive=recursive,
    )

Using _merge_push_meta Directly

from dvc.repo import Repo
from dvc.repo.worktree import _merge_push_meta
from dvc_data.index import DataIndex, DataIndexEntry, Meta

repo = Repo()

# Suppose we have an output and a rebuilt remote index
out = list(repo.index.outs)[0]

# Build a mock remote index entry with version metadata
remote_index = DataIndex()
_, key = out.index_key
remote_index[key] = DataIndexEntry(
    key=key,
    meta=Meta(version_id="v2_abc123", size=1024),
    hash_info=out.hash_info,
)

# Merge the remote metadata into the output
_merge_push_meta(out, remote_index, remote="s3remote")

# The output now carries the remote version ID
print(f"Version ID: {out.meta.version_id}")
print(f"Remote: {out.meta.remote}")

# Persist to disk
out.stage.dump(with_files=True, update_pipeline=False)

Understanding the Rebuild Step

# The _rebuild helper queries the remote for current metadata.
# This is an internal function in dvc/repo/push.py (L13-34).

from dvc_data.index import DataIndex, DataIndexEntry, Meta

def _rebuild(idx, path, fs, cb):
    """Rebuild a data index with fresh metadata from the remote filesystem."""
    new = DataIndex()
    items = list(idx.items())
    cb.set_size(len(items))

    for key, entry in items:
        if entry.meta and entry.meta.isdir:
            meta = Meta(isdir=True)
        else:
            try:
                meta = Meta.from_info(fs.info(fs.join(path, *key)), fs.protocol)
            except FileNotFoundError:
                meta = None

        if meta:
            new.add(DataIndexEntry(key=key, meta=meta))
        cb.relative_update(1)

    return new

Related Pages

Implements Principle

Requires Environment

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment