Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:Iterative Dvc SingleStageFile Dump

From Leeroopedia


Knowledge Sources
Domains Data_Versioning, Configuration_Management
Last Updated 2026-02-10 00:00 GMT

Overview

Concrete tool for serializing DVC stage metadata to .dvc YAML metafiles on disk, provided by the DVC library.

Description

The SingleStageFile.dump method in DVC's dvc/dvcfile.py module writes a stage's tracking metadata to a .dvc file. This is the final step in the dvc add workflow, where the computed content hash and output metadata are persisted as a lightweight YAML pointer file that can be committed to Git.

The method first verifies that the target path is a valid DVC filename (ending in .dvc and not git-ignored). It then delegates serialization to serialize.to_single_stage_file, which calls stage.dumpd() to produce a dictionary representation of the stage's state -- including output paths, MD5 hashes, file sizes, and dependency information. This dictionary is passed through a round-trip preservation mechanism: if the .dvc file already exists and has user comments or custom formatting, the existing YAML text is re-parsed with a format-preserving parser (ruamel.yaml), the new values are merged into the preserved structure via apply_diff, and the result is written back. For new files, the dictionary is written directly using dump_yaml.

After writing the file, dump registers the .dvc file path with the SCM context by calling self.repo.scm_context.track_file(self.relpath), ensuring that the metafile will be staged for the next Git commit (either automatically or via user guidance).

The to_single_stage_file function in dvc/stage/serialize.py handles the actual state-to-dictionary conversion. It calls stage.dumpd() to obtain the raw state, then optionally applies the round-trip preservation logic when existing YAML text is available (stored in stage._stage_text).

Usage

Use SingleStageFile.dump when you need to persist a stage's metadata to a .dvc file. This is called internally at the end of dvc add (in the _add helper, via stage.dump()) and during dvc commit operations. It can also be used programmatically when building tools that modify DVC tracking metadata.

Code Reference

Source Location

  • Repository: DVC
  • File: dvc/dvcfile.py (dump), dvc/stage/serialize.py (to_single_stage_file)
  • Lines: L193-202 (dump), L200-215 (to_single_stage_file)

Signature

class SingleStageFile(FileMixin):
    def dump(self, stage: "Stage", **kwargs) -> None:
        """Dumps given stage appropriately in the dvcfile.

        Validates the file path, serializes stage metadata to a dict,
        writes the YAML to disk, and registers the file for SCM tracking.
        """
        ...


def to_single_stage_file(stage: "Stage", **kwargs) -> dict:
    """Serialize a Stage to a dictionary suitable for writing to a .dvc file.

    If existing YAML text is available on the stage, uses round-trip
    preservation to maintain comments and formatting.

    Args:
        stage: The Stage object with populated outs, deps, and hash_info.

    Returns:
        A dictionary ready for YAML serialization.
    """
    ...

Import

from dvc.dvcfile import SingleStageFile
from dvc.stage.serialize import to_single_stage_file

I/O Contract

Inputs

Name Type Required Description
self SingleStageFile Yes The file abstraction representing the target .dvc file. Has repo (Repo instance), path (absolute path to the .dvc file), and verify (bool, whether to validate filename) attributes.
stage Stage Yes The Stage object containing the tracking metadata to serialize. Must have outs (list of Output objects with populated hash_info and meta), deps (list of Dependency objects), and dumpd() method.
kwargs dict No Additional keyword arguments passed through to stage.dumpd(), such as with_files (bool) to include per-file hash details for directory outputs.

Outputs

Name Type Description
(dump return) None No return value. Side effects: (1) A .dvc YAML file is written or updated at self.path containing the serialized stage metadata. (2) The file path is registered with repo.scm_context for Git staging.
(to_single_stage_file return) dict A dictionary representing the stage state, ready for YAML serialization. Contains keys such as outs (list of output dicts with md5, size, path fields), deps (if any), and md5 (stage-level hash).

Usage Examples

Basic Usage

from dvc.repo import Repo
from dvc.dvcfile import SingleStageFile
from dvc.stage.serialize import to_single_stage_file

repo = Repo()

# Create and populate a stage
stage = repo.stage.create(
    single_stage=True,
    fname="data.csv.dvc",
    outs=["data.csv"],
)

# After hashing (e.g., via out.add() or out.save()):
out = stage.outs[0]
out.save()

# Serialize to see what will be written
state_dict = to_single_stage_file(stage)
print(state_dict)
# Output:
# {
#     'outs': [
#         {
#             'md5': 'd41d8cd98f00b204e9800998ecf8427e',
#             'size': 1048576,
#             'hash': 'md5',
#             'path': 'data.csv'
#         }
#     ]
# }

# Write the .dvc file to disk
dvc_file = SingleStageFile(repo, "data.csv.dvc")
dvc_file.dump(stage)
# Creates/updates data.csv.dvc and registers it for git tracking

Related Pages

Implements Principle

Requires Environment

Uses Heuristic

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment