Implementation:Iterative Dvc Update Meta
| Knowledge Sources | |
|---|---|
| Domains | Data_Synchronization, Index_Management |
| Last Updated | 2026-02-10 00:00 GMT |
Overview
Concrete tool for merging remote storage metadata (version IDs, checksums) back into local DVC output entries after push operations, provided by the DVC library.
Description
The _update_meta function orchestrates the post-push metadata reconciliation process. After a push to a version-aware remote, it iterates over outputs grouped by remote, rebuilds the remote data index to capture the cloud-assigned metadata (version IDs, ETags), and delegates per-output merging to _merge_push_meta. Finally, it dumps the updated metadata back to the DVC stage files (.dvc or dvc.lock).
_update_meta lives in dvc/repo/push.py and is called in the finally block of the push function, ensuring metadata is updated even if the push partially fails. The companion _merge_push_meta function in dvc/repo/worktree.py handles the detailed per-output merge logic, including special handling for directory outputs where individual file version IDs must be reconciled with the existing tree object.
The _rebuild helper (also in push.py, L13-34) is responsible for querying the remote filesystem's current metadata (via fs.info) for every entry in the data index, producing an updated DataIndex with fresh Meta objects that carry the cloud-assigned version IDs.
Together, these functions ensure that after a push, the local DVC metafiles contain the exact version information needed for any future pull from the same remote.
Usage
The _update_meta function is an internal implementation detail called automatically by the push command. It should not typically be imported directly. However, understanding its behavior is essential when debugging version-aware remote workflows, diagnosing version ID mismatches, or extending DVC's push pipeline. _merge_push_meta can be used directly in custom worktree synchronization code.
Code Reference
Source Location
- Repository: DVC
- File:
dvc/repo/push.py - Lines: L37-60 (_update_meta)
- Supporting file:
dvc/repo/worktree.py - Lines: L100-158 (_merge_push_meta)
Signature
# _update_meta
def _update_meta(index, **kwargs) -> None:
# _merge_push_meta
def _merge_push_meta(
out: "Output",
index: Union["DataIndex", "DataIndexView"],
remote: Optional[str] = None,
) -> None:
Import
from dvc.repo.push import _update_meta
from dvc.repo.worktree import _merge_push_meta
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| index | Index | Yes | The full repository data index (for _update_meta). Used to generate worktree views grouped by remote. |
| **kwargs | dict | No | Additional keyword arguments passed to worktree_view_by_remotes, including targets, with_deps, and recursive for filtering which outputs to update. |
| out | Output | Yes | (For _merge_push_meta) The DVC output whose metadata should be updated with remote version information. |
| index | Union[DataIndex, DataIndexView] | Yes | (For _merge_push_meta) The rebuilt remote data index containing fresh metadata with cloud-assigned version IDs. |
| remote | Optional[str] | No | (For _merge_push_meta) The remote name to tag on the output's metadata, enabling multi-remote tracking. |
Outputs
| Name | Type | Description |
|---|---|---|
| return (_update_meta) | None | The function modifies outputs and their stage files in place. Affected stages are dumped with updated metadata (with_files=True, update_pipeline=False). |
| return (_merge_push_meta) | None | The function modifies the Output object in place, updating its hash_info, meta (including version_id and remote), and obj (tree object for directories). |
Side effects:
- Output objects are mutated: hash_info, meta, and obj fields are updated.
- Stage files (.dvc or dvc.lock) are rewritten to disk via stage.dump().
- For directory outputs, a new Tree object is built and assigned to out.obj.
Usage Examples
Basic Usage (Internal Call from Push)
# This is how _update_meta is called internally by the push function.
# Users do not typically call this directly.
from dvc.repo.push import _update_meta
from dvc.repo.index import IndexView
# After a successful push operation:
ws_idx = indexes.get("workspace")
if ws_idx is not None:
_index = ws_idx.index if isinstance(ws_idx, IndexView) else ws_idx
_update_meta(
_index,
targets=targets,
with_deps=with_deps,
recursive=recursive,
)
Using _merge_push_meta Directly
from dvc.repo import Repo
from dvc.repo.worktree import _merge_push_meta
from dvc_data.index import DataIndex, DataIndexEntry, Meta
repo = Repo()
# Suppose we have an output and a rebuilt remote index
out = list(repo.index.outs)[0]
# Build a mock remote index entry with version metadata
remote_index = DataIndex()
_, key = out.index_key
remote_index[key] = DataIndexEntry(
key=key,
meta=Meta(version_id="v2_abc123", size=1024),
hash_info=out.hash_info,
)
# Merge the remote metadata into the output
_merge_push_meta(out, remote_index, remote="s3remote")
# The output now carries the remote version ID
print(f"Version ID: {out.meta.version_id}")
print(f"Remote: {out.meta.remote}")
# Persist to disk
out.stage.dump(with_files=True, update_pipeline=False)
Understanding the Rebuild Step
# The _rebuild helper queries the remote for current metadata.
# This is an internal function in dvc/repo/push.py (L13-34).
from dvc_data.index import DataIndex, DataIndexEntry, Meta
def _rebuild(idx, path, fs, cb):
"""Rebuild a data index with fresh metadata from the remote filesystem."""
new = DataIndex()
items = list(idx.items())
cb.set_size(len(items))
for key, entry in items:
if entry.meta and entry.meta.isdir:
meta = Meta(isdir=True)
else:
try:
meta = Meta.from_info(fs.info(fs.join(path, *key)), fs.protocol)
except FileNotFoundError:
meta = None
if meta:
new.add(DataIndexEntry(key=key, meta=meta))
cb.relative_update(1)
return new