Implementation:Iterative Dvc DataResolver Resolve One
| Knowledge Sources | |
|---|---|
| Domains | Pipeline_Management, Configuration_Parsing |
| Last Updated | 2026-02-10 00:00 GMT |
Overview
Concrete tool for resolving individual pipeline stage definitions from parameterized YAML configuration into executable stage dictionaries, provided by the DVC library.
Description
The DataResolver class in DVC's parsing module is the central engine for transforming raw dvc.yaml content into resolved stage definitions. It manages a Context object populated from variable sources (params.yaml and inline vars sections), wraps each stage definition into typed definition objects (EntryDefinition, ForeachDefinition, MatrixDefinition), and resolves them lazily on demand via the resolve_one() method.
The companion StageLoader.load_stage() class method takes the resolved dictionary output from DataResolver.resolve_one() and hydrates it into a full PipelineStage object, attaching dependencies, outputs, parameters, and lockfile checksums. Together, these two components form the complete pipeline definition loading pipeline: from raw YAML to executable stage objects.
The DataResolver is instantiated per-ProjectFile (cached as a property) and operates relative to the working directory of the dvc.yaml file. It supports tracked variable usage, enabling DVC to report which variables were actually consumed during resolution.
Usage
Use DataResolver and StageLoader.load_stage() when you need to:
- Resolve a single stage from a parameterized dvc.yaml file by name.
- Expand foreach or matrix stages into individual resolved stage definitions.
- Load a PipelineStage object with full dependency/output metadata and lockfile state.
- Inspect which variables were used during stage resolution (via tracked_vars).
Code Reference
Source Location
- Repository: DVC
- File:
dvc/parsing/__init__.py - Lines: L137-261 (DataResolver class)
- File:
dvc/stage/loader.py - Lines: L85-119 (StageLoader.load_stage)
Signature
class DataResolver:
def __init__(self, repo: "Repo", wdir: str, d: dict) -> None:
...
def resolve_one(self, name: str) -> "DictStrAny":
...
def resolve(self) -> dict:
"""Used for testing purposes, otherwise use resolve_one()."""
...
def has_key(self, key: str) -> bool:
...
def get_keys(self) -> list[str]:
...
def track_vars(self, name: str, vars_) -> None:
...
class StageLoader(Mapping):
@classmethod
def load_stage(
cls,
dvcfile: "ProjectFile",
name: str,
stage_data: dict,
lock_data: Optional[dict] = None,
) -> "PipelineStage":
...
@staticmethod
def fill_from_lock(stage, lock_data=None) -> None:
...
Import
from dvc.parsing import DataResolver
from dvc.stage.loader import StageLoader
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| repo | Repo | Yes | The DVC repository instance providing filesystem access and configuration |
| wdir | str | Yes | Working directory path (absolute or relative) for resolving file references |
| d | dict | Yes | Raw parsed YAML dictionary from dvc.yaml containing stages, vars, artifacts, etc. |
| name | str | Yes | The stage name to resolve (may include "@" separator for foreach/matrix-generated stages) |
| dvcfile | ProjectFile | Yes | The DVC project file instance (for StageLoader.load_stage) |
| stage_data | dict | Yes | Resolved stage definition dictionary (for StageLoader.load_stage) |
| lock_data | Optional[dict] | No | Lockfile data for the stage, containing recorded checksums and param values |
Outputs
| Name | Type | Description |
|---|---|---|
| resolved_data | DictStrAny | Dictionary mapping stage name to its fully resolved definition with all interpolations expanded (from resolve_one) |
| stage | PipelineStage | Fully hydrated pipeline stage object with deps, outs, params, and lockfile state (from load_stage) |
Usage Examples
Basic Usage
from dvc.parsing import DataResolver
from dvc.stage.loader import StageLoader
# Given a DVC repo and raw dvc.yaml content:
raw_data = {
"vars": [{"learning_rate": 0.001, "epochs": 10}],
"stages": {
"train": {
"cmd": "python train.py --lr ${learning_rate} --epochs ${epochs}",
"deps": ["data/prepared", "src/train.py"],
"outs": ["model/model.pkl"],
"params": ["learning_rate", "epochs"],
}
},
}
# Create the resolver
resolver = DataResolver(repo, wdir=".", d=raw_data)
# Resolve a single stage by name
resolved = resolver.resolve_one("train")
# resolved == {"train": {"cmd": "python train.py --lr 0.001 --epochs 10", ...}}
# Load the resolved data into a full PipelineStage
stage = StageLoader.load_stage(
dvcfile=project_file,
name="train",
stage_data=resolved["train"],
lock_data=lockfile_data.get("train", {}),
)
# stage.cmd == "python train.py --lr 0.001 --epochs 10"
# stage.deps contains Dependency objects with lockfile checksums
# stage.outs contains Output objects with lockfile hash_info
Foreach Expansion
# dvc.yaml with foreach
raw_data = {
"stages": {
"process": {
"foreach": ["train", "test", "val"],
"do": {
"cmd": "python process.py ${item}",
"deps": ["data/${item}.csv"],
"outs": ["data/${item}_processed.csv"],
},
}
},
}
resolver = DataResolver(repo, wdir=".", d=raw_data)
# Get all generated stage keys
keys = resolver.get_keys()
# keys == ["process@train", "process@test", "process@val"]
# Resolve one generated stage
resolved = resolver.resolve_one("process@train")
# resolved == {"process@train": {"cmd": "python process.py train", ...}}