Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Workflow:Dotnet Machinelearning AutoML Experiment

From Leeroopedia
Revision as of 11:01, 16 February 2026 by Admin (talk | contribs) (Auto-imported from workflows/Dotnet_Machinelearning_AutoML_Experiment.md)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)


Knowledge Sources
Domains Machine_Learning, AutoML, Hyperparameter_Optimization
Last Updated 2026-02-09 12:00 GMT

Overview

End-to-end process for automated model selection and hyperparameter optimization using ML.NET's AutoML framework to find the best-performing model without manual algorithm tuning.

Description

This workflow outlines the procedure for using ML.NET's AutoML capabilities to automatically discover the best machine learning pipeline for a given task. AutoML eliminates the need for manual algorithm selection and hyperparameter tuning by systematically exploring combinations of trainers (FastTree, LightGBM, SDCA, etc.) and their hyperparameters using configurable search strategies. The framework supports binary classification, multiclass classification, regression, ranking, and time series forecasting tasks. It uses a SweepablePipeline abstraction with symbolic operators (+ for OneOf, * for Concatenate) and Bayesian optimization or portfolio-based tuning to efficiently navigate the search space.

Usage

Execute this workflow when you have a labeled dataset and want to quickly identify the best-performing model without manually experimenting with different algorithms and hyperparameters. This is particularly useful for baseline model development, rapid prototyping, or when you lack domain expertise in choosing the right algorithm for your data characteristics.

Execution Steps

Step 1: Initialize MLContext and Load Data

Create an MLContext instance and load the training dataset into an IDataView. Split the data into training and validation sets. The data schema should have a clearly defined label column appropriate for the ML task (Boolean for binary classification, String/Key for multiclass, Single for regression).

Key considerations:

  • AutoML requires both training and validation data for evaluating candidate models
  • Ensure the label column is correctly typed for the intended task
  • Large datasets may benefit from sampling to speed up the search process

Step 2: Define Sweepable Pipeline

Construct a SweepablePipeline that defines the space of possible ML pipelines to explore. Use the symbolic pipeline algebra: the + operator creates a OneOf choice between alternatives (e.g., FastTree + LightGBM + SDCA means "try any one of these"), and the * operator concatenates sequential pipeline stages. AutoML provides convenience methods like context.AutoML().BinaryClassification() that return pre-configured sweepable pipelines with all available trainers and default search spaces.

Key considerations:

  • The + operator (OneOf) defines alternative trainers or transforms to choose from
  • The * operator (Concatenate) chains pipeline stages sequentially
  • Default search spaces are defined in JSON configuration files with hyperparameter ranges
  • Custom search spaces can be defined using SearchSpace attributes or the fluent API
  • Pre-configured portfolios provide known-good hyperparameter starting points for Bayesian optimization

Step 3: Configure Experiment

Create an Experiment from the sweepable pipeline and configure its parameters using the fluent API. Set the training time budget, evaluation metric (AUC, Accuracy, R-Squared, etc.), dataset splitting strategy (train-test or cross-validation), and tuning strategy. Optionally attach a monitor for progress reporting.

Key considerations:

  • Training time budget controls how long AutoML searches; longer budgets explore more configurations
  • Choose evaluation metrics appropriate to the task and business requirements
  • Cross-validation (cv with N folds) provides more robust estimates but increases total training time
  • The tuner strategy (GridSearch, RandomSearch, Bayesian) affects search efficiency
  • A monitor callback enables real-time progress reporting during the search

Step 4: Run Experiment

Execute the experiment by calling Run(). AutoML iteratively proposes hyperparameter configurations from the tuner, builds concrete ML.NET training pipelines, trains models, evaluates them against the chosen metric, and updates the tuner with results. The framework automatically tracks the best model and parameters found so far.

Key considerations:

  • Each trial trains a complete pipeline (transforms + trainer) with a specific parameter configuration
  • The tuner learns from previous trial results to propose increasingly better configurations
  • Early stopping may terminate poorly-performing trials to save compute budget
  • All trial results are logged for post-analysis

Step 5: Extract Best Model and Metrics

After the experiment completes, retrieve the best-performing model and its associated metrics and hyperparameters. The best model is a fully trained ITransformer that can be used directly for predictions or saved for deployment.

Key considerations:

  • The best model includes the complete pipeline (feature engineering + trainer)
  • Review the trial history to understand which algorithms and parameters performed best
  • Compare the best model's metrics against business requirements before deployment
  • Consider retraining the best configuration on the full dataset (train + validation) for deployment

Step 6: Save and Deploy Best Model

Persist the best model to disk and deploy it for inference. The deployment process is identical to manually trained models: use PredictionEngine for real-time predictions or Transform() for batch scoring.

Key considerations:

  • The saved model is self-contained with all transforms and the trained algorithm
  • Document which AutoML configuration produced the model for reproducibility
  • Monitor model performance in production and re-run AutoML periodically with new data

Execution Diagram

GitHub URL

Workflow Repository