Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:Iterative Dvc DataResolver Resolve One

From Leeroopedia


Knowledge Sources
Domains Pipeline_Management, Configuration_Parsing
Last Updated 2026-02-10 00:00 GMT

Overview

Concrete tool for resolving individual pipeline stage definitions from parameterized YAML configuration into executable stage dictionaries, provided by the DVC library.

Description

The DataResolver class in DVC's parsing module is the central engine for transforming raw dvc.yaml content into resolved stage definitions. It manages a Context object populated from variable sources (params.yaml and inline vars sections), wraps each stage definition into typed definition objects (EntryDefinition, ForeachDefinition, MatrixDefinition), and resolves them lazily on demand via the resolve_one() method.

The companion StageLoader.load_stage() class method takes the resolved dictionary output from DataResolver.resolve_one() and hydrates it into a full PipelineStage object, attaching dependencies, outputs, parameters, and lockfile checksums. Together, these two components form the complete pipeline definition loading pipeline: from raw YAML to executable stage objects.

The DataResolver is instantiated per-ProjectFile (cached as a property) and operates relative to the working directory of the dvc.yaml file. It supports tracked variable usage, enabling DVC to report which variables were actually consumed during resolution.

Usage

Use DataResolver and StageLoader.load_stage() when you need to:

  • Resolve a single stage from a parameterized dvc.yaml file by name.
  • Expand foreach or matrix stages into individual resolved stage definitions.
  • Load a PipelineStage object with full dependency/output metadata and lockfile state.
  • Inspect which variables were used during stage resolution (via tracked_vars).

Code Reference

Source Location

  • Repository: DVC
  • File: dvc/parsing/__init__.py
  • Lines: L137-261 (DataResolver class)
  • File: dvc/stage/loader.py
  • Lines: L85-119 (StageLoader.load_stage)

Signature

class DataResolver:
    def __init__(self, repo: "Repo", wdir: str, d: dict) -> None:
        ...

    def resolve_one(self, name: str) -> "DictStrAny":
        ...

    def resolve(self) -> dict:
        """Used for testing purposes, otherwise use resolve_one()."""
        ...

    def has_key(self, key: str) -> bool:
        ...

    def get_keys(self) -> list[str]:
        ...

    def track_vars(self, name: str, vars_) -> None:
        ...


class StageLoader(Mapping):
    @classmethod
    def load_stage(
        cls,
        dvcfile: "ProjectFile",
        name: str,
        stage_data: dict,
        lock_data: Optional[dict] = None,
    ) -> "PipelineStage":
        ...

    @staticmethod
    def fill_from_lock(stage, lock_data=None) -> None:
        ...

Import

from dvc.parsing import DataResolver
from dvc.stage.loader import StageLoader

I/O Contract

Inputs

Name Type Required Description
repo Repo Yes The DVC repository instance providing filesystem access and configuration
wdir str Yes Working directory path (absolute or relative) for resolving file references
d dict Yes Raw parsed YAML dictionary from dvc.yaml containing stages, vars, artifacts, etc.
name str Yes The stage name to resolve (may include "@" separator for foreach/matrix-generated stages)
dvcfile ProjectFile Yes The DVC project file instance (for StageLoader.load_stage)
stage_data dict Yes Resolved stage definition dictionary (for StageLoader.load_stage)
lock_data Optional[dict] No Lockfile data for the stage, containing recorded checksums and param values

Outputs

Name Type Description
resolved_data DictStrAny Dictionary mapping stage name to its fully resolved definition with all interpolations expanded (from resolve_one)
stage PipelineStage Fully hydrated pipeline stage object with deps, outs, params, and lockfile state (from load_stage)

Usage Examples

Basic Usage

from dvc.parsing import DataResolver
from dvc.stage.loader import StageLoader

# Given a DVC repo and raw dvc.yaml content:
raw_data = {
    "vars": [{"learning_rate": 0.001, "epochs": 10}],
    "stages": {
        "train": {
            "cmd": "python train.py --lr ${learning_rate} --epochs ${epochs}",
            "deps": ["data/prepared", "src/train.py"],
            "outs": ["model/model.pkl"],
            "params": ["learning_rate", "epochs"],
        }
    },
}

# Create the resolver
resolver = DataResolver(repo, wdir=".", d=raw_data)

# Resolve a single stage by name
resolved = resolver.resolve_one("train")
# resolved == {"train": {"cmd": "python train.py --lr 0.001 --epochs 10", ...}}

# Load the resolved data into a full PipelineStage
stage = StageLoader.load_stage(
    dvcfile=project_file,
    name="train",
    stage_data=resolved["train"],
    lock_data=lockfile_data.get("train", {}),
)
# stage.cmd == "python train.py --lr 0.001 --epochs 10"
# stage.deps contains Dependency objects with lockfile checksums
# stage.outs contains Output objects with lockfile hash_info

Foreach Expansion

# dvc.yaml with foreach
raw_data = {
    "stages": {
        "process": {
            "foreach": ["train", "test", "val"],
            "do": {
                "cmd": "python process.py ${item}",
                "deps": ["data/${item}.csv"],
                "outs": ["data/${item}_processed.csv"],
            },
        }
    },
}

resolver = DataResolver(repo, wdir=".", d=raw_data)

# Get all generated stage keys
keys = resolver.get_keys()
# keys == ["process@train", "process@test", "process@val"]

# Resolve one generated stage
resolved = resolver.resolve_one("process@train")
# resolved == {"process@train": {"cmd": "python process.py train", ...}}

Related Pages

Implements Principle

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment