Workflow:Iterative Dvc Plot Visualization

Knowledge Sources	DVC DVC Docs DVC Plots Reference
Domains	Visualization, MLOps, Metrics
Last Updated	2026-02-10 10:30 GMT

Overview

End-to-end process for generating visual comparisons of metrics, parameters, and data across DVC experiments and Git revisions using Vega-Lite based plot rendering.

Description

This workflow covers DVC's plotting system, which transforms structured data files (CSV, TSV, JSON, YAML) and images into interactive Vega-Lite visualizations. Plots can compare data across multiple Git revisions and experiments, enabling visual analysis of training metrics, parameter effects, and model performance over time. The system collects plot definitions from `dvc.yaml`, resolves data sources from the workspace or Git history, and renders them using configurable Vega-Lite templates.

Goal: Interactive HTML visualizations comparing metrics and data across experiments and revisions.

Scope: From plot definition in `dvc.yaml` through data collection to rendered Vega-Lite output.

Strategy: Multi-threaded data collection across revisions with Vega-Lite template-based rendering and support for both tabular data and image comparison.

Usage

Execute this workflow when:

You want to visualize training loss curves across multiple experiment runs
You need to compare model performance metrics between Git revisions
You want to display confusion matrices, ROC curves, or other evaluation plots
You need to generate an HTML report comparing experiments side by side
You want to track how metrics change over the course of a training run

Execution Steps

Step 1: Collect Plot Definitions

DVC scans the pipeline definition (`dvc.yaml`) and tracked outputs to identify all plot sources. Plot definitions specify the data file, axis mappings, template, and display properties. Plots can be defined explicitly in the `plots` section of `dvc.yaml` or implicitly through outputs marked with the `plot` flag.

Key considerations:

Plot definitions support `x`, `y`, `x_label`, `y_label`, `title`, and `template` properties
Multiple data files can be combined into a single plot
Directory targets are recursively expanded to find all plottable files
The `--targets` flag filters which plots to render

Step 2: Resolve Data Sources Across Revisions

For each specified Git revision (or the current workspace), DVC resolves the data sources by switching the repository context to that revision. Data loading is deferred using callable objects to enable parallel execution. The revision list defaults to the current workspace or can be expanded to include branches, tags, and experiment refs.

Key considerations:

The `dvc plots diff` command automatically compares HEAD against the current workspace
Multiple revisions can be specified for multi-version comparison
Each revision's data is loaded independently to prevent cross-contamination
The brancher utility handles transparent Git revision switching

Step 3: Load and Parse Data

Data files are loaded and parsed according to their format. Supported formats include CSV, TSV, JSON, YAML, and image files (PNG, JPG, SVG). Tabular data is parsed into records; image files are base64-encoded for embedding. Data loading happens in parallel across up to 16 worker threads with progress reporting.

Key considerations:

CSV and TSV files are parsed with automatic header detection
JSON files can contain arrays of records or nested structures
YAML files support structured metric data
Image files are converted to base64 data URIs for HTML embedding
Files that fail to parse are reported via error callbacks without aborting

Step 4: Convert to Vega-Lite Format

Parsed data is transformed into Vega-Lite compatible format by the converter layer. Tabular data is mapped to Vega data arrays with field assignments based on the plot definition. The converter handles axis configuration, field renaming, multi-revision overlays, and data filtering. Image data is formatted into a side-by-side comparison layout.

Key considerations:

The Vega converter maps `x` and `y` properties to Vega encoding channels
Multi-revision data is combined with a revision field for color-coded overlays
Linear, log, and categorical scales are supported
The converter auto-detects field types (quantitative, temporal, nominal)

Step 5: Apply Vega-Lite Templates

Converted data is merged with a Vega-Lite template specification to produce the final visualization. DVC ships with built-in templates for common plot types (linear, confusion matrix, scatter) and supports custom templates. The template defines the visual encoding, mark type, and interactive features.

Key considerations:

Built-in templates include: linear, confusion, scatter, and smooth
Custom templates can be specified as file paths in plot definitions
Templates are Vega-Lite JSON specifications with DVC-specific placeholder fields
The `dvc plots modify` command changes template and property assignments for existing plots

Step 6: Render Output

The final Vega-Lite specifications are serialized to JSON and optionally rendered to an HTML file. The HTML output includes the Vega-Embed library for interactive viewing with zoom, pan, and data inspection. Multiple plots are arranged in a single report page.

Key considerations:

The `--show-vega` flag outputs raw Vega-Lite JSON instead of HTML
The `--open` flag automatically opens the rendered HTML in the default browser
The `--out` flag specifies a custom output directory
Plot definitions and rendered data are returned as structured JSON for programmatic use

Execution Diagram

GitHub URL

Workflow Repository