Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Heuristic:Mlflow Mlflow Nested Run Organization

From Leeroopedia
Knowledge Sources
Domains Experiment_Tracking, Best_Practices
Last Updated 2026-02-13 20:00 GMT

Overview

Organization pattern for hierarchical experiment tracking using nested MLflow runs with thread-local run stack management.

Description

MLflow maintains a thread-local stack of active runs that supports hierarchical (parent-child) run organization. Nested runs are useful for hyperparameter sweeps, cross-validation folds, or multi-stage training pipelines where each sub-task should be grouped under a parent run. The run stack is strictly thread-local, meaning each thread has its own isolated stack of active runs.

Usage

Use this heuristic when you need to organize related runs hierarchically — for example, a hyperparameter search where each trial is a child run, or a pipeline with distinct training stages. Also important to understand when working with multi-threaded training code, as the active run is not shared across threads.

The Insight (Rule of Thumb)

  • Action: Use `mlflow.start_run(nested=True)` or `mlflow.start_run(parent_run_id=...)` to create child runs.
  • Value: Parent run must be in ACTIVE state. Only one non-nested run per thread is allowed without calling `end_run()` first.
  • Trade-off: Thread-local run stacks mean `mlflow.active_run()` only returns the run from the current thread. Multi-threaded code needs explicit run management.

Pattern:

with mlflow.start_run() as parent_run:
    mlflow.log_param("search_space", "lr=[1e-4, 1e-2]")
    for lr in [1e-4, 1e-3, 1e-2]:
        with mlflow.start_run(nested=True) as child_run:
            mlflow.log_param("lr", lr)
            # train model...
            mlflow.log_metric("accuracy", acc)

Reasoning

The thread-local run stack design ensures thread safety but creates constraints that developers need to understand:

Thread-local run stack from `mlflow/tracking/fluent.py:120-124`:

_active_run_stack = ThreadLocalVariable(default_factory=lambda: [])
_last_active_run_id = ThreadLocalVariable(default_factory=lambda: None)
_last_logged_model_id = ThreadLocalVariable(default_factory=lambda: None)

Parent run validation from `mlflow/tracking/fluent.py:325-374`:

def start_run(
    run_id=None, experiment_id=None, run_name=None,
    nested=False, parent_run_id=None, ...
):
    # Parent run must be in ACTIVE state
    # parent_run_id must match current active run if specified explicitly
    # Only one non-nested run per thread without end_run()

Thread-safety limitation documented in docstring at `mlflow/tracking/fluent.py:650-677`:

# active_run() is thread-local and returns only the active run
# in the current thread. If a run is started in a different thread,
# this API will not retrieve that run.

Cross-thread run access (internal) from `mlflow/tracking/fluent.py:743-754`:

def _get_latest_active_run():
    all_active_runs = [
        run for run_stack in _active_run_stack.get_all_thread_values().values()
        for run in run_stack
    ]
    if all_active_runs:
        return max(all_active_runs, key=lambda run: run.info.start_time)

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment