Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Workflow:Iterative Dvc Plot Visualization

From Leeroopedia


Knowledge Sources
Domains Visualization, MLOps, Metrics
Last Updated 2026-02-10 10:30 GMT

Overview

End-to-end process for generating visual comparisons of metrics, parameters, and data across DVC experiments and Git revisions using Vega-Lite based plot rendering.

Description

This workflow covers DVC's plotting system, which transforms structured data files (CSV, TSV, JSON, YAML) and images into interactive Vega-Lite visualizations. Plots can compare data across multiple Git revisions and experiments, enabling visual analysis of training metrics, parameter effects, and model performance over time. The system collects plot definitions from `dvc.yaml`, resolves data sources from the workspace or Git history, and renders them using configurable Vega-Lite templates.

Goal: Interactive HTML visualizations comparing metrics and data across experiments and revisions.

Scope: From plot definition in `dvc.yaml` through data collection to rendered Vega-Lite output.

Strategy: Multi-threaded data collection across revisions with Vega-Lite template-based rendering and support for both tabular data and image comparison.

Usage

Execute this workflow when:

  • You want to visualize training loss curves across multiple experiment runs
  • You need to compare model performance metrics between Git revisions
  • You want to display confusion matrices, ROC curves, or other evaluation plots
  • You need to generate an HTML report comparing experiments side by side
  • You want to track how metrics change over the course of a training run

Execution Steps

Step 1: Collect Plot Definitions

DVC scans the pipeline definition (`dvc.yaml`) and tracked outputs to identify all plot sources. Plot definitions specify the data file, axis mappings, template, and display properties. Plots can be defined explicitly in the `plots` section of `dvc.yaml` or implicitly through outputs marked with the `plot` flag.

Key considerations:

  • Plot definitions support `x`, `y`, `x_label`, `y_label`, `title`, and `template` properties
  • Multiple data files can be combined into a single plot
  • Directory targets are recursively expanded to find all plottable files
  • The `--targets` flag filters which plots to render

Step 2: Resolve Data Sources Across Revisions

For each specified Git revision (or the current workspace), DVC resolves the data sources by switching the repository context to that revision. Data loading is deferred using callable objects to enable parallel execution. The revision list defaults to the current workspace or can be expanded to include branches, tags, and experiment refs.

Key considerations:

  • The `dvc plots diff` command automatically compares HEAD against the current workspace
  • Multiple revisions can be specified for multi-version comparison
  • Each revision's data is loaded independently to prevent cross-contamination
  • The brancher utility handles transparent Git revision switching

Step 3: Load and Parse Data

Data files are loaded and parsed according to their format. Supported formats include CSV, TSV, JSON, YAML, and image files (PNG, JPG, SVG). Tabular data is parsed into records; image files are base64-encoded for embedding. Data loading happens in parallel across up to 16 worker threads with progress reporting.

Key considerations:

  • CSV and TSV files are parsed with automatic header detection
  • JSON files can contain arrays of records or nested structures
  • YAML files support structured metric data
  • Image files are converted to base64 data URIs for HTML embedding
  • Files that fail to parse are reported via error callbacks without aborting

Step 4: Convert to Vega-Lite Format

Parsed data is transformed into Vega-Lite compatible format by the converter layer. Tabular data is mapped to Vega data arrays with field assignments based on the plot definition. The converter handles axis configuration, field renaming, multi-revision overlays, and data filtering. Image data is formatted into a side-by-side comparison layout.

Key considerations:

  • The Vega converter maps `x` and `y` properties to Vega encoding channels
  • Multi-revision data is combined with a revision field for color-coded overlays
  • Linear, log, and categorical scales are supported
  • The converter auto-detects field types (quantitative, temporal, nominal)

Step 5: Apply Vega-Lite Templates

Converted data is merged with a Vega-Lite template specification to produce the final visualization. DVC ships with built-in templates for common plot types (linear, confusion matrix, scatter) and supports custom templates. The template defines the visual encoding, mark type, and interactive features.

Key considerations:

  • Built-in templates include: linear, confusion, scatter, and smooth
  • Custom templates can be specified as file paths in plot definitions
  • Templates are Vega-Lite JSON specifications with DVC-specific placeholder fields
  • The `dvc plots modify` command changes template and property assignments for existing plots

Step 6: Render Output

The final Vega-Lite specifications are serialized to JSON and optionally rendered to an HTML file. The HTML output includes the Vega-Embed library for interactive viewing with zoom, pan, and data inspection. Multiple plots are arranged in a single report page.

Key considerations:

  • The `--show-vega` flag outputs raw Vega-Lite JSON instead of HTML
  • The `--open` flag automatically opens the rendered HTML in the default browser
  • The `--out` flag specifies a custom output directory
  • Plot definitions and rendered data are returned as structured JSON for programmatic use

Execution Diagram

GitHub URL

Workflow Repository