Implementation:Scikit learn Scikit learn TheilSenRegressor

Knowledge Sources	Scikit_learn Scikit-learn Docs
Domains	Machine Learning, Robust Regression
Last Updated	2026-02-08 15:00 GMT

Overview

Concrete tool for robust multivariate regression using the Theil-Sen estimator based on median of pairwise slopes provided by scikit-learn.

Description

TheilSenRegressor implements the Theil-Sen estimator, a robust multivariate regression model. The algorithm calculates least square solutions on subsets of size n_subsamples, then computes the spatial median (L1 median) of all solutions as the final estimate. This approach provides a high breakdown point (up to about 29.3% for large samples), meaning it can tolerate a significant fraction of outliers. The computational cost is managed by limiting the number of subpopulations considered via the max_subpopulation parameter.

Usage

Use TheilSenRegressor when you need a regression model that is highly robust to outliers, especially when you expect up to ~29% of data points to be outliers. It is more robust than HuberRegressor for datasets with a higher fraction of outliers, though computationally more expensive. It is commonly used in scientific data analysis where measurement errors or anomalous readings are expected.

Code Reference

Source Location

Repository: scikit-learn
File: sklearn/linear_model/_theil_sen.py

Signature

class TheilSenRegressor(RegressorMixin, LinearModel):
    def __init__(
        self,
        *,
        fit_intercept=True,
        max_subpopulation=1e4,
        n_subsamples=None,
        max_iter=300,
        tol=1e-3,
        random_state=None,
        n_jobs=None,
        verbose=False,
    ):

Import

from sklearn.linear_model import TheilSenRegressor

I/O Contract

Inputs

Name	Type	Required	Description
fit_intercept	bool	No	Whether to calculate the intercept (default=True)
max_subpopulation	int	No	Maximum stochastic subpopulation size for 'n choose k' subsets (default=1e4)
n_subsamples	int	No	Number of samples per subset; between n_features and n_samples (default=None, minimum for maximal robustness)
max_iter	int	No	Maximum iterations for spatial median calculation (default=300)
tol	float	No	Tolerance for spatial median convergence (default=1e-3)
random_state	int or RandomState	No	Random seed for reproducibility
n_jobs	int	No	Number of CPUs for parallel computation (default=None)
verbose	bool	No	Verbose mode during fitting (default=False)

Outputs

Name	Type	Description
coef_	ndarray of shape (n_features,)	Estimated coefficients of the regression model
intercept_	float	Estimated intercept of the regression model
breakdown_	float	Approximate breakdown point of the estimator
n_iter_	int	Number of iterations for spatial median computation
n_subpopulation_	int	Number of combinations considered for random subsampling

Usage Examples

Basic Usage

from sklearn.linear_model import TheilSenRegressor
from sklearn.datasets import make_regression
import numpy as np

X, y = make_regression(n_samples=100, n_features=5, noise=10, random_state=42)
# Add outliers
y[:10] = np.random.RandomState(42).uniform(-500, 500, size=10)

model = TheilSenRegressor(random_state=42)
model.fit(X, y)
print("Breakdown point:", model.breakdown_)
print("Coefficients:", model.coef_)

Related Pages

Principle:Scikit_learn_Scikit_learn_Robust_Regression

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment