Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Principle:Scikit learn Scikit learn Model Training

From Leeroopedia


Field Value
sources Bishop, C.M. (2006). Pattern Recognition and Machine Learning, Springer; scikit-learn documentation: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
domains Machine_Learning, Optimization, Statistics
last_updated 2026-02-08 15:00 GMT

Overview

An optimization process that adjusts model parameters to minimize a loss function on training data.

Description

Model training (also called fitting) is the core computational step in supervised learning. Given a training set of labeled examples {(𝐱i,yi)}i=1n, the training process searches for parameter values that minimize a chosen loss function over the training data, subject to optional regularization constraints.

In scikit-learn, training is triggered by calling the fit(X, y) method on an instantiated estimator. This method:

  1. Validates and preprocesses the input data (type checking, dtype conversion, sparse format handling).
  2. Executes the optimization algorithm specified by the estimator's hyperparameters.
  3. Stores the learned parameters as instance attributes with trailing underscores (e.g., coef_, intercept_).
  4. Returns self, enabling method chaining.

The specific optimization strategy depends on the estimator and its configuration. Common approaches include:

  • Gradient-based optimization -- Iterative methods such as L-BFGS, Newton-CG, and stochastic average gradient (SAG/SAGA) that use gradient information to find the loss minimum.
  • Coordinate descent -- Used by the liblinear solver for L1-regularized problems.
  • Closed-form solutions -- Some estimators (e.g., ordinary least squares) compute parameters directly via matrix algebra.

Regularization is a technique applied during training to prevent overfitting by penalizing large parameter values. Common regularization forms include L2 (ridge), L1 (lasso), and Elastic-Net (a combination of L1 and L2).

Usage

Use model training when:

  • Fitting a model to labeled data -- The standard supervised learning workflow requires calling fit(X_train, y_train) on the training subset.
  • Retraining after hyperparameter changes -- After modifying hyperparameters via set_params, the model must be re-fitted.
  • Warm starting -- Some estimators support warm_start=True, allowing training to resume from previously learned parameters rather than starting from scratch.

Theoretical Basis

Maximum Likelihood Estimation

For classification with logistic regression, training corresponds to maximum likelihood estimation (MLE). The model assumes that the probability of class k given features 𝐱 follows the softmax (multinomial) or sigmoid (binary) function of a linear combination of features.

In the binary case, the model estimates:

P(y=1|𝐱)=σ(𝐰T𝐱+b)=11+e(𝐰T𝐱+b)

where σ is the logistic sigmoid function, 𝐰 is the weight vector, and b is the bias (intercept).

Loss Minimization

MLE is equivalent to minimizing the negative log-likelihood, which for logistic regression yields the logistic loss (also called cross-entropy loss or log loss):

(𝐰,b)=1ni=1n[yilog(p^i)+(1yi)log(1p^i)]

where p^i=σ(𝐰T𝐱i+b) is the predicted probability for sample i.

Regularized Objective

With regularization, the objective becomes:

min𝐰,b12C[(1α)𝐰22+α𝐰1]+(𝐰,b)

where C is the inverse regularization strength and α is the L1 ratio (Elastic-Net mixing parameter). Setting α=0 yields pure L2 regularization; α=1 yields pure L1 regularization.

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment