Principle:Rapidsai Cuml Linear Model Fitting
| Knowledge Sources | |
|---|---|
| Domains | Machine_Learning, Linear_Models, Optimization |
| Last Updated | 2026-02-08 12:00 GMT |
Overview
Linear model fitting is the process of solving regularized linear optimization problems -- including L1, L2, and ElasticNet penalties combined with squared-error or logistic loss -- using iterative solvers such as coordinate descent, quasi-Newton methods, or stochastic gradient descent.
Description
Linear models form the backbone of many supervised learning tasks. They express the predicted output as a linear combination of input features, optionally transformed through a link function (e.g., the logistic sigmoid for classification). The model parameters (weights) are estimated by minimizing a loss function, typically augmented with a regularization term to prevent overfitting.
Loss Functions:
- Squared error loss (used by Lasso, ElasticNet, Ridge, SGD regressors): Measures the mean squared difference between predicted and actual values. Suitable for regression tasks.
- Logistic loss (used by Logistic Regression, SGD classifiers): The negative log-likelihood of the Bernoulli model, suitable for binary and multiclass classification. The logistic function maps the linear predictor to a probability in (0, 1).
Regularization Penalties:
- L2 (Ridge): Adds the squared L2 norm of the weight vector to the loss. Shrinks all coefficients toward zero but does not produce exact zeros. Controlled by a regularization strength parameter.
- L1 (Lasso): Adds the L1 norm of the weight vector. Encourages sparsity by driving some coefficients exactly to zero, effectively performing feature selection.
- ElasticNet: A convex combination of L1 and L2 penalties, controlled by a mixing parameter (l1_ratio). Balances sparsity induction with coefficient stability.
Solvers:
- Coordinate Descent (CD): Optimizes one coefficient at a time while holding others fixed. Especially efficient for L1-penalized problems because the soft-thresholding update has a closed-form solution. Iterates until convergence.
- Quasi-Newton (L-BFGS / OWL-QN): Uses approximate second-order curvature information to achieve superlinear convergence. L-BFGS is well-suited for smooth (L2-penalized) problems; the OWL-QN variant handles L1 penalties.
- Stochastic Gradient Descent (SGD): Updates weights using the gradient computed on a single mini-batch of data at each iteration. Scales well to very large datasets because each iteration touches only a small fraction of the data.
Usage
Linear model fitting is the right choice when:
- The relationship between features and the target is approximately linear (or can be made so with feature engineering).
- Interpretability of coefficients is important, as each weight directly quantifies the marginal effect of the corresponding feature.
- Feature selection is desired (use L1 or ElasticNet regularization).
- The dataset is large enough that scalability matters, in which case SGD or GPU-accelerated solvers provide significant speedup.
- A baseline model is needed before exploring more complex nonlinear methods.
For classification tasks, Logistic Regression with L2 or ElasticNet regularization is a strong default. For regression tasks with many correlated features, ElasticNet combines the sparsity of Lasso with the stability of Ridge.
Theoretical Basis
The general regularized linear model objective is:
where is the loss function, is the regularization strength, and is the L1 ratio (rho=1 gives Lasso, rho=0 gives Ridge, values in between give ElasticNet).
Logistic loss:
where is the logistic sigmoid.
Squared error loss:
Coordinate Descent Update (ElasticNet):
For each feature j:
partial_residual = y - X * w + X_j * w_j
rho_j = X_j^T * partial_residual / n
w_j = soft_threshold(rho_j, alpha * l1_ratio) / (1 + alpha * (1 - l1_ratio))
where soft_threshold(z, gamma) = sign(z) * max(|z| - gamma, 0)
SGD Update:
For each mini-batch B:
g = (1/|B|) * sum_{i in B} grad_L(y_i, w^T x_i) * x_i
g += alpha * ((1 - l1_ratio) * w + l1_ratio * sign(w))
w = w - eta * g
eta is decayed according to a learning rate schedule
Related Pages
Implemented By
- Implementation:Rapidsai_Cuml_LogisticRegression
- Implementation:Rapidsai_Cuml_ElasticNet
- Implementation:Rapidsai_Cuml_Lasso
- Implementation:Rapidsai_Cuml_MBSGDClassifier
- Implementation:Rapidsai_Cuml_MBSGDRegressor
- Implementation:Rapidsai_Cuml_GLM_API
- Implementation:Rapidsai_Cuml_GLM_C_API
- Implementation:Rapidsai_Cuml_SGD_CD_Solver