Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Iterative Dvc Collect Plot Definitions

From Leeroopedia
Revision as of 15:18, 16 February 2026 by Admin (talk | contribs) (Auto-imported from implementations/Iterative_Dvc_Collect_Plot_Definitions.md)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)


Knowledge Sources
Domains Visualization, Configuration_Management
Last Updated 2026-02-10 00:00 GMT

Overview

Concrete tool for collecting and merging plot definitions from DVC pipeline configurations and output annotations, provided by the DVC library.

Description

The _collect_definitions function is the primary entry point for gathering all plot definitions from a DVC repository. It aggregates plot specifications from two sources: pipeline-level plot blocks defined in dvc.yaml files (via _collect_pipeline_files) and output-level plot annotations on tracked DVC outputs (via _collect_output_plots). It then handles bare filesystem targets that do not match any pipeline definition. All collected definitions are deep-merged using the dpath library, and user-provided property overrides are unioned onto each definition.

The companion function _collect_plots operates at the data source level, gathering the actual plot output objects and their associated display properties. It uses DVC's collect utility to find outputs matching the _is_plot filter and returns a dictionary mapping filesystem paths to their plot properties. This function is called during data source collection rather than definition collection, but forms an essential part of the overall plot gathering infrastructure.

Usage

Use _collect_definitions when you need to build the complete registry of plot definitions for a given repository state. This is typically called once per revision during the plot visualization workflow. Use _collect_plots when you need to enumerate the actual data files that correspond to plot outputs, along with their display properties, for constructing lazy data source loaders.

Code Reference

Source Location

  • Repository: DVC
  • File: dvc/repo/plots/__init__.py
  • Lines: L507-528 (_collect_definitions), L317-335 (_collect_plots)

Signature

@error_handler
def _collect_definitions(
    repo: "Repo",
    targets: list[str],
    props: Optional[dict] = None,
    onerror: Optional[Callable] = None,
    **kwargs,
) -> dict:
    ...
def _collect_plots(
    repo: "Repo",
    targets: Optional[list[str]] = None,
    recursive: bool = False,
) -> dict[str, dict]:
    ...

Import

from dvc.repo.plots import _collect_definitions
from dvc.repo.plots import _collect_plots

I/O Contract

Inputs

Name Type Required Description
repo Repo Yes The DVC repository instance providing access to the index, filesystem, and pipeline definitions.
targets list[str] Yes List of plot targets (file paths or plot IDs) to filter definitions. An empty list collects all definitions.
props Optional[dict] No User-provided property overrides (e.g., template, x, y, title) to merge onto each definition. Defaults to empty dict.
onerror Optional[Callable] No Error handler callback invoked when individual definition resolution fails. Receives the error and context.
recursive bool No (For _collect_plots) Whether to recursively search for plot outputs in subdirectories. Defaults to False.

Outputs

Name Type Description
result dict For _collect_definitions: a nested dictionary keyed by config file path (empty string for non-pipeline sources), containing data sub-dictionaries mapping plot IDs to their merged property dicts. Wrapped in an error_handler that may include an error key on failure.
result dict[str, dict] For _collect_plots: a dictionary mapping DVC filesystem paths to their plot display properties (e.g., template, x, y, x_label, y_label). Paths without explicit properties map to empty dicts.

Usage Examples

Basic Usage

from dvc.repo import Repo
from dvc.repo.plots import _collect_definitions, _collect_plots

# Open a DVC repository
repo = Repo()

# Collect all plot definitions with no target filtering
definitions = _collect_definitions(
    repo,
    targets=[],
    props={"template": "linear"},
    onerror=None,
)
# definitions structure:
# {
#     "dvc.yaml": {
#         "data": {
#             "plots/loss.csv": {"template": "linear", "x": "step", "y": "loss"}
#         }
#     }
# }

# Collect plot output files and their properties
plots = _collect_plots(repo, targets=None, recursive=False)
# plots structure:
# {
#     "plots/loss.csv": {"x": "step", "y": "loss"},
#     "plots/acc.csv": {}
# }

Targeted Collection

from dvc.repo import Repo
from dvc.repo.plots import _collect_definitions

repo = Repo()

# Collect definitions for specific targets with user overrides
definitions = _collect_definitions(
    repo,
    targets=["plots/loss.csv", "plots/accuracy.csv"],
    props={"x": "epoch", "y_label": "Value", "title": "Training Metrics"},
)
# User-provided props are merged onto each matching definition,
# overriding any defaults from dvc.yaml

With Error Handling

from dvc.repo import Repo
from dvc.repo.plots import _collect_definitions, onerror_collect

repo = Repo()
result = {}

definitions = _collect_definitions(
    repo,
    targets=[],
    props={},
    onerror=lambda *args, **kwargs: onerror_collect(result, *args, **kwargs),
)

if "error" in result:
    print(f"Some definitions failed to collect: {result['error']}")

Related Pages

Implements Principle

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment