Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Principle:Iterative Dvc Experiment Comparison

From Leeroopedia
Revision as of 17:11, 16 February 2026 by Admin (talk | contribs) (Auto-imported from principles/Iterative_Dvc_Experiment_Comparison.md)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)


Knowledge Sources
Domains Experiment_Management, Data_Analysis
Last Updated 2026-02-10 00:00 GMT

Overview

Experiment comparison is the systematic collection, aggregation, and tabular presentation of metrics, parameters, and dependency states across multiple experiment revisions for analysis and ranking.

Description

Running experiments produces a wealth of data -- metrics like accuracy and loss, parameters like learning rate and batch size, dependency hashes that track data provenance -- scattered across multiple Git revisions and ref namespaces. Without a structured comparison mechanism, practitioners must manually check out individual experiments and inspect their outputs, a process that does not scale beyond a handful of runs. Experiment comparison solves this by collecting data from all specified experiments and presenting it in a unified tabular format where each row represents an experiment and each column represents a metric, parameter, or dependency.

The comparison process involves two distinct phases. Collection traverses the experiment ref namespace and workspace to gather ExpState objects, each containing the metrics, parameters, timestamps, and dependency information for a single experiment revision. The collection phase supports filtering by branches, tags, commit ranges, and queue status (queued, failed, workspace). Tabulation transforms the collected states into a structured table, resolving column names across experiments (since different experiments may track different metrics), applying fill values for missing data, and optionally sorting by a specified metric or parameter.

The tabular output is designed for both human consumption (rendered as a terminal table or markdown) and programmatic use (exportable as CSV or dictionary). The column structure is dynamic: it adapts to the set of metrics and parameters present across the compared experiments, automatically adding columns for new metrics and using fill values (typically "-") for experiments that lack a particular metric.

Usage

Use experiment comparison when:

  • You have completed multiple experiment runs and need to identify the best-performing configuration
  • You want to understand how parameter changes correlate with metric changes
  • You need to generate a report or table of experiment results for documentation or review
  • You are performing hyperparameter search and need to rank results by a target metric
  • You need to audit the dependency state across experiments to verify data provenance

This technique is the design trigger whenever the number of experiments exceeds what can be mentally tracked, or when formal comparison criteria need to be applied.

Theoretical Basis

Experiment comparison follows a collect-normalize-tabulate pipeline:

function compare_experiments(repo, revisions, filters):
    # Phase 1: Collection
    exp_states = []
    for rev in resolve_revisions(revisions, filters):
        state = load_exp_state(rev)
        # state contains: metrics{path: {name: value}},
        #                  params{path: {name: value}},
        #                  deps{name: hash}, timestamp, name
        exp_states.append(state)

    # Phase 2: Column Name Resolution
    all_metric_names = collect_unique_names(exp_states, "metrics")
    all_param_names = collect_unique_names(exp_states, "params")
    all_dep_names = collect_unique_names(exp_states, "deps")

    # Handle name collisions across files
    headers = resolve_ambiguous_names(
        all_metric_names, all_param_names
    )
    # If "accuracy" appears in both metrics.json and eval.json,
    # columns become "metrics.json:accuracy" and "eval.json:accuracy"

    # Phase 3: Tabulation
    table = TabularData(columns=headers, fill_value="-")
    for state in exp_states:
        row = build_row(state, headers, fill_value="-")
        table.append(row)

    # Optional: Sort by metric
    if sort_by:
        table.sort(key=sort_by, order=sort_order)

    return table

Column name disambiguation is a key aspect of the normalization phase. When a metric name like "loss" appears in multiple parameter or metric files, the system prefixes it with the file path to avoid ambiguity. The algorithm counts occurrences of each name across all files; names that appear exactly once are used as-is, while names that appear in multiple files are qualified with their file path:

function normalize_headers(names_by_file, global_name_count):
    headers = []
    for file_path in names_by_file:
        for name in names_by_file[file_path]:
            if global_name_count[name] == 1:
                headers.append(name)
            else:
                headers.append(file_path + ":" + name)
    return headers

The hierarchical structure of the comparison output reflects the experiment lineage: baseline commits form top-level rows, with their derived experiments nested beneath. This tree structure makes it easy to see which experiments belong to which baseline and to compare siblings derived from the same starting point.

Key theoretical properties:

  1. Schema flexibility: The table schema adapts dynamically to the union of all metrics and parameters across experiments
  2. Missing data handling: Experiments that lack a particular metric receive a configurable fill value rather than causing errors
  3. Stable ordering: Column order follows the insertion order of metric and parameter files, providing consistency across repeated comparisons
  4. Composability: The tabular output can be filtered, sorted, projected, and exported in multiple formats

Related Pages

Implemented By

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment