Principle:Scikit learn Scikit learn Model Training
| Field | Value |
|---|---|
| sources | Bishop, C.M. (2006). Pattern Recognition and Machine Learning, Springer; scikit-learn documentation: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression |
| domains | Machine_Learning, Optimization, Statistics |
| last_updated | 2026-02-08 15:00 GMT |
Overview
An optimization process that adjusts model parameters to minimize a loss function on training data.
Description
Model training (also called fitting) is the core computational step in supervised learning. Given a training set of labeled examples , the training process searches for parameter values that minimize a chosen loss function over the training data, subject to optional regularization constraints.
In scikit-learn, training is triggered by calling the fit(X, y) method on an instantiated estimator. This method:
- Validates and preprocesses the input data (type checking, dtype conversion, sparse format handling).
- Executes the optimization algorithm specified by the estimator's hyperparameters.
- Stores the learned parameters as instance attributes with trailing underscores (e.g.,
coef_,intercept_). - Returns
self, enabling method chaining.
The specific optimization strategy depends on the estimator and its configuration. Common approaches include:
- Gradient-based optimization -- Iterative methods such as L-BFGS, Newton-CG, and stochastic average gradient (SAG/SAGA) that use gradient information to find the loss minimum.
- Coordinate descent -- Used by the liblinear solver for L1-regularized problems.
- Closed-form solutions -- Some estimators (e.g., ordinary least squares) compute parameters directly via matrix algebra.
Regularization is a technique applied during training to prevent overfitting by penalizing large parameter values. Common regularization forms include L2 (ridge), L1 (lasso), and Elastic-Net (a combination of L1 and L2).
Usage
Use model training when:
- Fitting a model to labeled data -- The standard supervised learning workflow requires calling
fit(X_train, y_train)on the training subset. - Retraining after hyperparameter changes -- After modifying hyperparameters via
set_params, the model must be re-fitted. - Warm starting -- Some estimators support
warm_start=True, allowing training to resume from previously learned parameters rather than starting from scratch.
Theoretical Basis
Maximum Likelihood Estimation
For classification with logistic regression, training corresponds to maximum likelihood estimation (MLE). The model assumes that the probability of class given features follows the softmax (multinomial) or sigmoid (binary) function of a linear combination of features.
In the binary case, the model estimates:
where is the logistic sigmoid function, is the weight vector, and is the bias (intercept).
Loss Minimization
MLE is equivalent to minimizing the negative log-likelihood, which for logistic regression yields the logistic loss (also called cross-entropy loss or log loss):
where is the predicted probability for sample .
Regularized Objective
With regularization, the objective becomes:
where is the inverse regularization strength and is the L1 ratio (Elastic-Net mixing parameter). Setting yields pure L2 regularization; yields pure L1 regularization.