Principle:Iterative Dvc Execution Order Planning

Knowledge Sources	DVC Documentation
Domains	Pipeline_Management, Graph_Algorithms
Last Updated	2026-02-10 00:00 GMT

Overview

Execution order planning is the process of deriving a valid stage execution sequence from a pipeline dependency graph using depth-first search post-order traversal, ensuring that all dependencies are processed before their dependents.

Description

Given a directed acyclic graph (DAG) of pipeline stages, the execution order planning principle determines the sequence in which stages should be evaluated during pipeline reproduction. The fundamental requirement is that every stage must be executed only after all of its dependencies have been processed. This is a classic topological ordering problem, and DFS post-order traversal provides a natural solution.

The principle extends beyond simple full-pipeline ordering to support three critical modes of operation. Targeted reproduction allows the user to specify a subset of stages to reproduce; the planner must include all transitive dependencies of the target stages. Downstream propagation reverses the direction -- given a changed stage, it determines all stages that depend on it (directly or transitively) and includes them in the execution plan. Full pipeline mode selects entire weakly connected components containing the target stages, ensuring complete pipeline consistency.

An important refinement is the handling of frozen stages. A frozen stage is one that the user has explicitly marked as not to be re-executed. The planner disconnects frozen stages from their dependencies in the graph before planning, effectively treating them as fixed boundary conditions. This allows downstream stages to still be reproduced while respecting the user's intent that certain stages should not change.

Usage

Execution order planning should be used whenever:

A pipeline needs to be partially reproduced -- only the target stages and their upstream dependencies should be included.
Downstream impact analysis is needed -- when a data source or early stage changes, all affected downstream stages must be identified and re-executed in the correct order.
Frozen stages must be respected -- the plan must correctly handle stages that should not be re-executed, disconnecting them from their dependency chains.
Pipeline-level reproduction is requested -- the entire pipeline containing the target stage should be re-evaluated.

Theoretical Basis

The execution order planning algorithm builds on three graph operations:

PROCEDURE GetActiveGraph(graph):
    // Remove edges from frozen stages to their dependencies
    // This disconnects frozen stages, preventing them from being
    // re-evaluated even if their dependencies change
    active_graph = COPY(graph)
    FOR EACH stage IN graph:
        IF stage.frozen:
            REMOVE_OUTGOING_EDGES(active_graph, stage)
    RETURN active_graph

PROCEDURE GetSubgraph(graph, target_stages, pipeline_mode, downstream):
    IF pipeline_mode AND target_stages:
        // Include entire pipelines containing the target stages
        pipelines = WEAKLY_CONNECTED_COMPONENTS(graph)
        selected = [p FOR p IN pipelines IF ANY(node IN p FOR node IN target_stages)]
        RETURN COMPOSE(selected)

    IF NOT target_stages:
        RETURN graph

    IF downstream:
        // Reverse the graph so DFS from targets finds dependents
        reversed_graph = REVERSE(graph)
    ELSE:
        reversed_graph = graph

    // Collect all reachable nodes from targets
    nodes = []
    FOR EACH target IN target_stages:
        nodes.EXTEND(DFS_POSTORDER(reversed_graph, target))

    RETURN graph.SUBGRAPH(nodes)

PROCEDURE PlanRepro(graph, target_stages, pipeline_mode, downstream):
    subgraph = GetSubgraph(graph, target_stages, pipeline_mode, downstream)
    RETURN DFS_POSTORDER(subgraph)
    // Returns stages in order: dependencies first, dependents last

The key insight is the use of DFS post-order traversal for the final ordering. In a post-order traversal, a node is visited only after all nodes reachable from it have been visited. In the dependency graph where edges point from dependents to dependencies, this means dependencies appear before dependents in the output list -- exactly the execution order we need.

For downstream mode, the graph is conceptually reversed before DFS, so that starting from a given stage, the traversal discovers all stages that transitively depend on it. The resulting list, when combined with the original graph's subgraph view, gives the correct execution order for the downstream cascade.

Properties of the planning algorithm:

Correctness: DFS post-order on a DAG produces a valid topological ordering, guaranteeing that no stage is executed before its dependencies.
Minimality: Only stages reachable from the targets (via DFS) are included, avoiding unnecessary recomputation.
Uniqueness deduplication: When multiple targets share common ancestors, the algorithm naturally deduplicates via the subgraph node set.

Related Pages

Implemented By

Implementation:Iterative_Dvc_Plan_Repro

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment