Implementation:Iterative Dvc Collect Plot Definitions
| Knowledge Sources | |
|---|---|
| Domains | Visualization, Configuration_Management |
| Last Updated | 2026-02-10 00:00 GMT |
Overview
Concrete tool for collecting and merging plot definitions from DVC pipeline configurations and output annotations, provided by the DVC library.
Description
The _collect_definitions function is the primary entry point for gathering all plot definitions from a DVC repository. It aggregates plot specifications from two sources: pipeline-level plot blocks defined in dvc.yaml files (via _collect_pipeline_files) and output-level plot annotations on tracked DVC outputs (via _collect_output_plots). It then handles bare filesystem targets that do not match any pipeline definition. All collected definitions are deep-merged using the dpath library, and user-provided property overrides are unioned onto each definition.
The companion function _collect_plots operates at the data source level, gathering the actual plot output objects and their associated display properties. It uses DVC's collect utility to find outputs matching the _is_plot filter and returns a dictionary mapping filesystem paths to their plot properties. This function is called during data source collection rather than definition collection, but forms an essential part of the overall plot gathering infrastructure.
Usage
Use _collect_definitions when you need to build the complete registry of plot definitions for a given repository state. This is typically called once per revision during the plot visualization workflow. Use _collect_plots when you need to enumerate the actual data files that correspond to plot outputs, along with their display properties, for constructing lazy data source loaders.
Code Reference
Source Location
- Repository: DVC
- File:
dvc/repo/plots/__init__.py - Lines: L507-528 (_collect_definitions), L317-335 (_collect_plots)
Signature
@error_handler
def _collect_definitions(
repo: "Repo",
targets: list[str],
props: Optional[dict] = None,
onerror: Optional[Callable] = None,
**kwargs,
) -> dict:
...
def _collect_plots(
repo: "Repo",
targets: Optional[list[str]] = None,
recursive: bool = False,
) -> dict[str, dict]:
...
Import
from dvc.repo.plots import _collect_definitions
from dvc.repo.plots import _collect_plots
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| repo | Repo | Yes | The DVC repository instance providing access to the index, filesystem, and pipeline definitions. |
| targets | list[str] | Yes | List of plot targets (file paths or plot IDs) to filter definitions. An empty list collects all definitions. |
| props | Optional[dict] | No | User-provided property overrides (e.g., template, x, y, title) to merge onto each definition. Defaults to empty dict. |
| onerror | Optional[Callable] | No | Error handler callback invoked when individual definition resolution fails. Receives the error and context. |
| recursive | bool | No | (For _collect_plots) Whether to recursively search for plot outputs in subdirectories. Defaults to False. |
Outputs
| Name | Type | Description |
|---|---|---|
| result | dict | For _collect_definitions: a nested dictionary keyed by config file path (empty string for non-pipeline sources), containing data sub-dictionaries mapping plot IDs to their merged property dicts. Wrapped in an error_handler that may include an error key on failure. |
| result | dict[str, dict] | For _collect_plots: a dictionary mapping DVC filesystem paths to their plot display properties (e.g., template, x, y, x_label, y_label). Paths without explicit properties map to empty dicts. |
Usage Examples
Basic Usage
from dvc.repo import Repo
from dvc.repo.plots import _collect_definitions, _collect_plots
# Open a DVC repository
repo = Repo()
# Collect all plot definitions with no target filtering
definitions = _collect_definitions(
repo,
targets=[],
props={"template": "linear"},
onerror=None,
)
# definitions structure:
# {
# "dvc.yaml": {
# "data": {
# "plots/loss.csv": {"template": "linear", "x": "step", "y": "loss"}
# }
# }
# }
# Collect plot output files and their properties
plots = _collect_plots(repo, targets=None, recursive=False)
# plots structure:
# {
# "plots/loss.csv": {"x": "step", "y": "loss"},
# "plots/acc.csv": {}
# }
Targeted Collection
from dvc.repo import Repo
from dvc.repo.plots import _collect_definitions
repo = Repo()
# Collect definitions for specific targets with user overrides
definitions = _collect_definitions(
repo,
targets=["plots/loss.csv", "plots/accuracy.csv"],
props={"x": "epoch", "y_label": "Value", "title": "Training Metrics"},
)
# User-provided props are merged onto each matching definition,
# overriding any defaults from dvc.yaml
With Error Handling
from dvc.repo import Repo
from dvc.repo.plots import _collect_definitions, onerror_collect
repo = Repo()
result = {}
definitions = _collect_definitions(
repo,
targets=[],
props={},
onerror=lambda *args, **kwargs: onerror_collect(result, *args, **kwargs),
)
if "error" in result:
print(f"Some definitions failed to collect: {result['error']}")