Principle:Scikit learn Scikit learn Linear Regression
| Knowledge Sources | |
|---|---|
| Domains | Supervised Learning, Regression |
| Last Updated | 2026-02-08 15:00 GMT |
Overview
Linear regression models the relationship between a dependent variable and one or more independent variables by fitting a linear equation to observed data.
Description
Linear regression is the foundational supervised learning technique for predicting a continuous target as a linear combination of input features. Regularized variants address overfitting and multicollinearity by adding penalty terms to the loss function. Ridge regression applies an penalty, Lasso applies an penalty (producing sparse solutions), and ElasticNet combines both. These methods form the backbone of predictive modeling and are often the first approach tried before more complex models.
Usage
Use ordinary linear regression when the relationship between features and target is approximately linear and the number of features is moderate relative to the sample size. Use Ridge when features are correlated (multicollinearity) and you want to shrink coefficients without eliminating them. Use Lasso when you suspect many features are irrelevant and want automatic feature selection via sparsity. Use ElasticNet when you need a balance between Ridge and Lasso, particularly when features are correlated and some should be zeroed out. Use LARS (Least Angle Regression) when you want an efficient path algorithm for Lasso-type problems.
Theoretical Basis
Ordinary Least Squares (OLS) minimizes the residual sum of squares:
The closed-form solution is .
Ridge Regression adds an penalty:
The solution is . The regularization parameter controls the trade-off between fit and coefficient magnitude.
Lasso Regression adds an penalty:
The penalty induces sparsity, setting some coefficients exactly to zero, effectively performing feature selection.
ElasticNet combines both penalties:
where is the mixing ratio between and .
LARS (Least Angle Regression) is an efficient algorithm that computes the full regularization path for Lasso. It proceeds by identifying the feature most correlated with the current residual, then moving the coefficient in the direction of that feature until another feature becomes equally correlated, at which point both are adjusted simultaneously.