Principle:Scikit learn Scikit learn Linear Regression

Knowledge Sources	Scikit_learn Scikit-learn Docs
Domains	Supervised Learning, Regression
Last Updated	2026-02-08 15:00 GMT

Overview

Linear regression models the relationship between a dependent variable and one or more independent variables by fitting a linear equation to observed data.

Description

Linear regression is the foundational supervised learning technique for predicting a continuous target as a linear combination of input features. Regularized variants address overfitting and multicollinearity by adding penalty terms to the loss function. Ridge regression applies an $ℓ_{2}$ penalty, Lasso applies an $ℓ_{1}$ penalty (producing sparse solutions), and ElasticNet combines both. These methods form the backbone of predictive modeling and are often the first approach tried before more complex models.

Usage

Use ordinary linear regression when the relationship between features and target is approximately linear and the number of features is moderate relative to the sample size. Use Ridge when features are correlated (multicollinearity) and you want to shrink coefficients without eliminating them. Use Lasso when you suspect many features are irrelevant and want automatic feature selection via sparsity. Use ElasticNet when you need a balance between Ridge and Lasso, particularly when features are correlated and some should be zeroed out. Use LARS (Least Angle Regression) when you want an efficient path algorithm for Lasso-type problems.

Theoretical Basis

Ordinary Least Squares (OLS) minimizes the residual sum of squares:

$\hat{β} = \arg \min_{β} ‖ y - X β ‖_{2}^{2}$

The closed-form solution is $\hat{β} = (X^{T} X)^{- 1} X^{T} y$ .

Ridge Regression adds an $ℓ_{2}$ penalty:

${\hat{β}}_{ridge} = \arg \min_{β} ‖ y - X β ‖_{2}^{2} + α ‖ β ‖_{2}^{2}$

The solution is ${\hat{β}}_{ridge} = (X^{T} X + α I)^{- 1} X^{T} y$ . The regularization parameter $α$ controls the trade-off between fit and coefficient magnitude.

Lasso Regression adds an $ℓ_{1}$ penalty:

${\hat{β}}_{lasso} = \arg \min_{β} \frac{1}{2 n} ‖ y - X β ‖_{2}^{2} + α ‖ β ‖_{1}$

The $ℓ_{1}$ penalty induces sparsity, setting some coefficients exactly to zero, effectively performing feature selection.

ElasticNet combines both penalties:

${\hat{β}}_{enet} = \arg \min_{β} \frac{1}{2 n} ‖ y - X β ‖_{2}^{2} + α ρ ‖ β ‖_{1} + \frac{α (1 - ρ)}{2} ‖ β ‖_{2}^{2}$

where $ρ \in [0, 1]$ is the mixing ratio between $ℓ_{1}$ and $ℓ_{2}$ .

LARS (Least Angle Regression) is an efficient algorithm that computes the full regularization path for Lasso. It proceeds by identifying the feature most correlated with the current residual, then moving the coefficient in the direction of that feature until another feature becomes equally correlated, at which point both are adjusted simultaneously.

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment