Principle:Dotnet Machinelearning Binary Classification Training

Knowledge Sources	ML.NET ML.NET Docs
Domains	Machine Learning, Classification, Supervised Learning
Last Updated	2026-02-09 00:00 GMT

Overview

Binary classification is a supervised learning task that assigns each input instance to one of exactly two classes based on learned decision boundaries derived from labeled training data.

Description

Binary classification is one of the most common machine learning tasks. Given a set of labeled examples where each label is either positive or negative (1 or 0, true or false), the goal is to learn a function f(x) that maps input feature vectors to class predictions. The learned function generalizes to unseen examples by capturing patterns in the training data.

Three prominent algorithmic families for binary classification are:

Gradient boosted decision trees (e.g., FastTree, LightGBM): Build an ensemble of shallow decision trees sequentially, where each tree corrects the errors of the previous ensemble. These methods are highly effective on tabular data with mixed feature types.
Stochastic dual coordinate ascent (SDCA) for logistic regression: An optimization algorithm that solves the logistic regression objective by iterating over dual variables. SDCA is efficient for large, sparse datasets and converges to a linear decision boundary.

The estimator-transformer pattern cleanly separates the configuration of a training algorithm (hyperparameters, column names) from its execution (fitting on data). An estimator encapsulates the algorithm configuration and implements a Fit method that, given training data, produces a transformer. The transformer is the trained model that can score new data.

This separation enables:

Pipeline composition: chain transforms and trainers into a single estimator pipeline.
Reproducibility: the same estimator with the same data produces the same model.
Serialization: transformers can be saved and loaded independently of their estimator.

Usage

Use binary classification when the target variable has exactly two classes. Choose gradient boosted trees (FastTree or LightGBM) as a strong default for tabular data with moderate to large feature counts. Choose SDCA logistic regression when you need a linear model for interpretability, or when dealing with very high-dimensional sparse features (e.g., bag-of-words text features).

Theoretical Basis

Logistic regression models the log-odds of the positive class as a linear function of features:

P(y=1|x) = sigma(w^T x + b) = 1 / (1 + exp(-(w^T x + b)))

Loss = -sum_i [ y_i * log(P_i) + (1 - y_i) * log(1 - P_i) ]  (cross-entropy)

SDCA minimizes a regularized loss by iterating over training examples and updating dual variables:

minimize  (1/n) * sum_i loss(w^T x_i, y_i)  +  (lambda/2) * ||w||^2

For each example i:
  delta_alpha_i = argmax improvement in dual objective
  alpha_i += delta_alpha_i
  w += (delta_alpha_i / (lambda * n)) * x_i

Gradient boosted trees fit an additive model of trees:

F_0(x) = initial prediction (e.g., log-odds of positive class)
For m = 1 to M:
  r_i = -dLoss/dF_{m-1}(x_i)          // pseudo-residuals (negative gradient)
  h_m = fit regression tree to {(x_i, r_i)}
  F_m(x) = F_{m-1}(x) + eta * h_m(x)  // eta = learning rate

Prediction: P(y=1|x) = sigma(F_M(x))

Key hyperparameters include numberOfLeaves (tree complexity), learningRate (step size), and numberOfTrees (ensemble size). Smaller learning rates with more trees typically yield better generalization.

Related Pages

Implemented By

Implementation:Dotnet_Machinelearning_Binary_Classification_Trainers

Uses Heuristic

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment