Implementation:Iterative Dvc DataResolver Resolve One

Knowledge Sources	DVC
Domains	Pipeline_Management, Configuration_Parsing
Last Updated	2026-02-10 00:00 GMT

Overview

Concrete tool for resolving individual pipeline stage definitions from parameterized YAML configuration into executable stage dictionaries, provided by the DVC library.

Description

The DataResolver class in DVC's parsing module is the central engine for transforming raw dvc.yaml content into resolved stage definitions. It manages a Context object populated from variable sources (params.yaml and inline vars sections), wraps each stage definition into typed definition objects (EntryDefinition, ForeachDefinition, MatrixDefinition), and resolves them lazily on demand via the resolve_one() method.

The companion StageLoader.load_stage() class method takes the resolved dictionary output from DataResolver.resolve_one() and hydrates it into a full PipelineStage object, attaching dependencies, outputs, parameters, and lockfile checksums. Together, these two components form the complete pipeline definition loading pipeline: from raw YAML to executable stage objects.

The DataResolver is instantiated per-ProjectFile (cached as a property) and operates relative to the working directory of the dvc.yaml file. It supports tracked variable usage, enabling DVC to report which variables were actually consumed during resolution.

Usage

Use DataResolver and StageLoader.load_stage() when you need to:

Resolve a single stage from a parameterized dvc.yaml file by name.
Expand foreach or matrix stages into individual resolved stage definitions.
Load a PipelineStage object with full dependency/output metadata and lockfile state.
Inspect which variables were used during stage resolution (via tracked_vars).

Code Reference

Source Location

Repository: DVC
File: dvc/parsing/__init__.py
Lines: L137-261 (DataResolver class)
File: dvc/stage/loader.py
Lines: L85-119 (StageLoader.load_stage)

Signature

class DataResolver:
    def __init__(self, repo: "Repo", wdir: str, d: dict) -> None:
        ...

    def resolve_one(self, name: str) -> "DictStrAny":
        ...

    def resolve(self) -> dict:
        """Used for testing purposes, otherwise use resolve_one()."""
        ...

    def has_key(self, key: str) -> bool:
        ...

    def get_keys(self) -> list[str]:
        ...

    def track_vars(self, name: str, vars_) -> None:
        ...


class StageLoader(Mapping):
    @classmethod
    def load_stage(
        cls,
        dvcfile: "ProjectFile",
        name: str,
        stage_data: dict,
        lock_data: Optional[dict] = None,
    ) -> "PipelineStage":
        ...

    @staticmethod
    def fill_from_lock(stage, lock_data=None) -> None:
        ...

Import

from dvc.parsing import DataResolver
from dvc.stage.loader import StageLoader

I/O Contract

Inputs

Name	Type	Required	Description
repo	Repo	Yes	The DVC repository instance providing filesystem access and configuration
wdir	str	Yes	Working directory path (absolute or relative) for resolving file references
d	dict	Yes	Raw parsed YAML dictionary from dvc.yaml containing stages, vars, artifacts, etc.
name	str	Yes	The stage name to resolve (may include "@" separator for foreach/matrix-generated stages)
dvcfile	ProjectFile	Yes	The DVC project file instance (for StageLoader.load_stage)
stage_data	dict	Yes	Resolved stage definition dictionary (for StageLoader.load_stage)
lock_data	Optional[dict]	No	Lockfile data for the stage, containing recorded checksums and param values

Outputs

Name	Type	Description
resolved_data	DictStrAny	Dictionary mapping stage name to its fully resolved definition with all interpolations expanded (from resolve_one)
stage	PipelineStage	Fully hydrated pipeline stage object with deps, outs, params, and lockfile state (from load_stage)

Usage Examples

Basic Usage

from dvc.parsing import DataResolver
from dvc.stage.loader import StageLoader

# Given a DVC repo and raw dvc.yaml content:
raw_data = {
    "vars": [{"learning_rate": 0.001, "epochs": 10}],
    "stages": {
        "train": {
            "cmd": "python train.py --lr ${learning_rate} --epochs ${epochs}",
            "deps": ["data/prepared", "src/train.py"],
            "outs": ["model/model.pkl"],
            "params": ["learning_rate", "epochs"],
        }
    },
}

# Create the resolver
resolver = DataResolver(repo, wdir=".", d=raw_data)

# Resolve a single stage by name
resolved = resolver.resolve_one("train")
# resolved == {"train": {"cmd": "python train.py --lr 0.001 --epochs 10", ...}}

# Load the resolved data into a full PipelineStage
stage = StageLoader.load_stage(
    dvcfile=project_file,
    name="train",
    stage_data=resolved["train"],
    lock_data=lockfile_data.get("train", {}),
)
# stage.cmd == "python train.py --lr 0.001 --epochs 10"
# stage.deps contains Dependency objects with lockfile checksums
# stage.outs contains Output objects with lockfile hash_info

Foreach Expansion

# dvc.yaml with foreach
raw_data = {
    "stages": {
        "process": {
            "foreach": ["train", "test", "val"],
            "do": {
                "cmd": "python process.py ${item}",
                "deps": ["data/${item}.csv"],
                "outs": ["data/${item}_processed.csv"],
            },
        }
    },
}

resolver = DataResolver(repo, wdir=".", d=raw_data)

# Get all generated stage keys
keys = resolver.get_keys()
# keys == ["process@train", "process@test", "process@val"]

# Resolve one generated stage
resolved = resolver.resolve_one("process@train")
# resolved == {"process@train": {"cmd": "python process.py train", ...}}

Related Pages

Implements Principle

Principle:Iterative_Dvc_Pipeline_Definition_Loading

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment