Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Heuristic:Dotnet Machinelearning AutoML SMAC Dimension Limit

From Leeroopedia




Knowledge Sources
Domains AutoML, Optimization
Last Updated 2026-02-09 11:00 GMT

Overview

SMAC tuner effectiveness degrades above 20 hyperparameter dimensions; use pipeline proposer cost initialization to favor cheap trainers during exploration.

Description

ML.NET's AutoML uses SMAC (Sequential Model-based Algorithm Configuration) with Expected Improvement as the acquisition function. The SMAC tuner works well when the search space has 20 or fewer dimensions. Beyond this, the random forest surrogate model becomes unreliable. Additionally, the pipeline proposer uses initialization costs to bias early exploration toward cheaper trainers, and adds epsilon to all probabilities to prevent degenerate sampling where promising but underexplored pipelines would never be tried.

Usage

Use this heuristic when configuring AutoML experiments or designing custom search spaces. Keep the total number of hyperparameters (across all pipeline stages) at or below 20 for SMAC to work effectively. For larger spaces, consider reducing the pipeline alternatives or fixing some hyperparameters.

The Insight (Rule of Thumb)

SMAC Dimension Limit:

  • Action: Keep total hyperparameter dimensions <= 20 for SMAC tuner.
  • Value: 20 dimensions maximum for effective optimization.
  • Trade-off: Fewer dimensions = better surrogate model quality but less exploration. More dimensions = SMAC degrades to near-random search.

Pipeline Cost Initialization:

  • Action: Set lower initialization costs for cheaper trainers.
  • Value: Cost values determine exploration priority at startup.
  • Trade-off: Lower cost = trainer explored earlier (good for quick baselines). Higher cost = trainer deferred (good for expensive but potentially better trainers).

Probability Smoothing:

  • Action: Add `double.Epsilon` to all sampling probabilities.
  • Value: Prevents zero probability for any pipeline configuration.
  • Trade-off: Negligible computational cost; prevents pathological cases where valid configurations are permanently excluded from search.

Graceful Error Handling:

  • Action: Skip failing trial configurations instead of aborting the experiment.
  • Value: Catch both individual exceptions and AggregateExceptions from parallel training.
  • Trade-off: May miss the optimal configuration if it consistently fails, but the experiment continues rather than crashing.

Reasoning

SMAC builds a random forest to model the hyperparameter-to-performance mapping. In high dimensions (>20), the forest needs exponentially more training data to build an accurate model. With typical AutoML time budgets (minutes to hours), this results in the surrogate model being unreliable and SMAC performing no better than random search.

The epsilon-smoothed probabilities prevent the mathematical impossibility of sampling from a distribution where some entries are exactly zero, which can occur when a trainer has never been successful in any previous trial.

Code Evidence

SMAC dimension limit from `src/Microsoft.ML.AutoML/Tuner/SmacTuner.cs:20`:

/// Expected Improvement as acquisition function. In practice, smac works well on
/// search space which dimension is no larger than 20.

Cost initialization from `src/Microsoft.ML.AutoML/Tuner/PipelineProposer.cs:48`:

// this cost is used to initialize eci when started, the smaller the number,
// the less cost this trainer will use at start

Probability smoothing from `src/Microsoft.ML.AutoML/Tuner/PipelineProposer.cs:109`:

// Therefore, we need to make sure non of the probabilities is zero,
// and we can do that by adding a very small number (double.epsilon) to each

Parallel exception handling from `src/Microsoft.ML.AutoML/Experiment/Experiment.cs:206-208`:

// For some trainers, like FastTree, because training is done in parallel
// the AggregateException and misses the first catch block.
catch (Exception ex) when (aggregateTrainingStopManager.IsStopTrainingRequested() == false)

Minimum sample size from `src/Microsoft.ML.AutoML/AutoMLExperiment/IDatasetManager.cs:85`:

// take at least 10 rows to avoid empty dataset

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment