Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Principle:Dotnet Machinelearning Sweepable Pipeline Definition

From Leeroopedia


Knowledge Sources
Domains Machine_Learning, AutoML
Last Updated 2026-02-09 00:00 GMT

Overview

AutoML pipeline structure search uses a symbolic algebra to define the space of possible ML pipelines, enabling automated discovery of both the best algorithm and its optimal hyperparameter configuration.

Description

A sweepable pipeline represents not a single fixed ML pipeline, but an entire search space of pipelines. Traditional ML workflows require the practitioner to manually select a specific algorithm and its hyperparameters. Sweepable pipelines instead encode a declarative specification of all candidate pipelines that an AutoML engine should explore.

The search space is defined through two algebraic operators:

  • + (OneOf): Represents alternative estimators at a given pipeline stage. For example, FastTree + LightGBM + LogisticRegression means the AutoML engine should try each of these trainers and determine which performs best.
  • * (Concatenate): Represents sequential pipeline steps. For example, Featurizer * Trainer means data flows through the featurizer first, then into the trainer.

Each estimator within the pipeline carries an associated SearchSpace that defines the ranges and distributions of its tunable hyperparameters (learning rate, number of leaves, regularization weight, etc.). The combination of structural alternatives (which algorithms) and parametric ranges (which hyperparameter values) creates a rich combinatorial space.

This algebraic approach enables compositional pipeline construction. A featurizer pipeline that handles text, numeric, and categorical columns can be combined with a trainer pipeline that offers multiple classification algorithms, and the AutoML engine will search across the full Cartesian product of structural and parametric choices.

Usage

Use sweepable pipeline definitions when you want an AutoML system to search over both the choice of algorithm and the hyperparameter configuration. This is appropriate when you do not have strong prior knowledge about which algorithm will perform best on your data, or when you want to systematically benchmark multiple approaches. For production scenarios where the algorithm is already known, a fixed (non-sweepable) pipeline is more efficient.

Theoretical Basis

The sweepable pipeline formalism maps onto Combined Algorithm Selection and Hyperparameter optimization (CASH), introduced by Thornton et al. (2013). The CASH problem defines:

Given: a set of algorithms A = {A_1, ..., A_k}
       each A_i with hyperparameter space Lambda_i
       a dataset D and metric m

Find:  A* in A, lambda* in Lambda_{A*}
       such that m(A*(lambda*, D_train), D_val) is optimal

The algebraic operators map directly to this formalism:

OneOf(A_1, A_2, ..., A_k)     = algorithm selection (choose A*)
Concat(Step_1, Step_2, ...)   = pipeline composition (sequential stages)
SearchSpace(A_i)              = hyperparameter space Lambda_i

The pipeline search space is the union of all (algorithm, hyperparameter) combinations:

S = Union over i of {A_i} x Lambda_i

A tuner (e.g., Bayesian optimization, random search) samples from S, trains a model for each sample, evaluates on a validation set, and iterates. The sweepable pipeline provides the structured definition of S that the tuner navigates.

Related Pages

Implemented By

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment