Implementation:Iterative Dvc SingleStageFile Dump
| Knowledge Sources | |
|---|---|
| Domains | Data_Versioning, Configuration_Management |
| Last Updated | 2026-02-10 00:00 GMT |
Overview
Concrete tool for serializing DVC stage metadata to .dvc YAML metafiles on disk, provided by the DVC library.
Description
The SingleStageFile.dump method in DVC's dvc/dvcfile.py module writes a stage's tracking metadata to a .dvc file. This is the final step in the dvc add workflow, where the computed content hash and output metadata are persisted as a lightweight YAML pointer file that can be committed to Git.
The method first verifies that the target path is a valid DVC filename (ending in .dvc and not git-ignored). It then delegates serialization to serialize.to_single_stage_file, which calls stage.dumpd() to produce a dictionary representation of the stage's state -- including output paths, MD5 hashes, file sizes, and dependency information. This dictionary is passed through a round-trip preservation mechanism: if the .dvc file already exists and has user comments or custom formatting, the existing YAML text is re-parsed with a format-preserving parser (ruamel.yaml), the new values are merged into the preserved structure via apply_diff, and the result is written back. For new files, the dictionary is written directly using dump_yaml.
After writing the file, dump registers the .dvc file path with the SCM context by calling self.repo.scm_context.track_file(self.relpath), ensuring that the metafile will be staged for the next Git commit (either automatically or via user guidance).
The to_single_stage_file function in dvc/stage/serialize.py handles the actual state-to-dictionary conversion. It calls stage.dumpd() to obtain the raw state, then optionally applies the round-trip preservation logic when existing YAML text is available (stored in stage._stage_text).
Usage
Use SingleStageFile.dump when you need to persist a stage's metadata to a .dvc file. This is called internally at the end of dvc add (in the _add helper, via stage.dump()) and during dvc commit operations. It can also be used programmatically when building tools that modify DVC tracking metadata.
Code Reference
Source Location
- Repository: DVC
- File:
dvc/dvcfile.py(dump),dvc/stage/serialize.py(to_single_stage_file) - Lines: L193-202 (dump), L200-215 (to_single_stage_file)
Signature
class SingleStageFile(FileMixin):
def dump(self, stage: "Stage", **kwargs) -> None:
"""Dumps given stage appropriately in the dvcfile.
Validates the file path, serializes stage metadata to a dict,
writes the YAML to disk, and registers the file for SCM tracking.
"""
...
def to_single_stage_file(stage: "Stage", **kwargs) -> dict:
"""Serialize a Stage to a dictionary suitable for writing to a .dvc file.
If existing YAML text is available on the stage, uses round-trip
preservation to maintain comments and formatting.
Args:
stage: The Stage object with populated outs, deps, and hash_info.
Returns:
A dictionary ready for YAML serialization.
"""
...
Import
from dvc.dvcfile import SingleStageFile
from dvc.stage.serialize import to_single_stage_file
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| self | SingleStageFile |
Yes | The file abstraction representing the target .dvc file. Has repo (Repo instance), path (absolute path to the .dvc file), and verify (bool, whether to validate filename) attributes. |
| stage | Stage |
Yes | The Stage object containing the tracking metadata to serialize. Must have outs (list of Output objects with populated hash_info and meta), deps (list of Dependency objects), and dumpd() method. |
| kwargs | dict |
No | Additional keyword arguments passed through to stage.dumpd(), such as with_files (bool) to include per-file hash details for directory outputs. |
Outputs
| Name | Type | Description |
|---|---|---|
| (dump return) | None |
No return value. Side effects: (1) A .dvc YAML file is written or updated at self.path containing the serialized stage metadata. (2) The file path is registered with repo.scm_context for Git staging. |
| (to_single_stage_file return) | dict |
A dictionary representing the stage state, ready for YAML serialization. Contains keys such as outs (list of output dicts with md5, size, path fields), deps (if any), and md5 (stage-level hash). |
Usage Examples
Basic Usage
from dvc.repo import Repo
from dvc.dvcfile import SingleStageFile
from dvc.stage.serialize import to_single_stage_file
repo = Repo()
# Create and populate a stage
stage = repo.stage.create(
single_stage=True,
fname="data.csv.dvc",
outs=["data.csv"],
)
# After hashing (e.g., via out.add() or out.save()):
out = stage.outs[0]
out.save()
# Serialize to see what will be written
state_dict = to_single_stage_file(stage)
print(state_dict)
# Output:
# {
# 'outs': [
# {
# 'md5': 'd41d8cd98f00b204e9800998ecf8427e',
# 'size': 1048576,
# 'hash': 'md5',
# 'path': 'data.csv'
# }
# ]
# }
# Write the .dvc file to disk
dvc_file = SingleStageFile(repo, "data.csv.dvc")
dvc_file.dump(stage)
# Creates/updates data.csv.dvc and registers it for git tracking