Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Scikit learn Scikit learn TheilSenRegressor

From Leeroopedia
Revision as of 16:37, 16 February 2026 by Admin (talk | contribs) (Auto-imported from implementations/Scikit_learn_Scikit_learn_TheilSenRegressor.md)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)


Knowledge Sources
Domains Machine Learning, Robust Regression
Last Updated 2026-02-08 15:00 GMT

Overview

Concrete tool for robust multivariate regression using the Theil-Sen estimator based on median of pairwise slopes provided by scikit-learn.

Description

TheilSenRegressor implements the Theil-Sen estimator, a robust multivariate regression model. The algorithm calculates least square solutions on subsets of size n_subsamples, then computes the spatial median (L1 median) of all solutions as the final estimate. This approach provides a high breakdown point (up to about 29.3% for large samples), meaning it can tolerate a significant fraction of outliers. The computational cost is managed by limiting the number of subpopulations considered via the max_subpopulation parameter.

Usage

Use TheilSenRegressor when you need a regression model that is highly robust to outliers, especially when you expect up to ~29% of data points to be outliers. It is more robust than HuberRegressor for datasets with a higher fraction of outliers, though computationally more expensive. It is commonly used in scientific data analysis where measurement errors or anomalous readings are expected.

Code Reference

Source Location

Signature

class TheilSenRegressor(RegressorMixin, LinearModel):
    def __init__(
        self,
        *,
        fit_intercept=True,
        max_subpopulation=1e4,
        n_subsamples=None,
        max_iter=300,
        tol=1e-3,
        random_state=None,
        n_jobs=None,
        verbose=False,
    ):

Import

from sklearn.linear_model import TheilSenRegressor

I/O Contract

Inputs

Name Type Required Description
fit_intercept bool No Whether to calculate the intercept (default=True)
max_subpopulation int No Maximum stochastic subpopulation size for 'n choose k' subsets (default=1e4)
n_subsamples int No Number of samples per subset; between n_features and n_samples (default=None, minimum for maximal robustness)
max_iter int No Maximum iterations for spatial median calculation (default=300)
tol float No Tolerance for spatial median convergence (default=1e-3)
random_state int or RandomState No Random seed for reproducibility
n_jobs int No Number of CPUs for parallel computation (default=None)
verbose bool No Verbose mode during fitting (default=False)

Outputs

Name Type Description
coef_ ndarray of shape (n_features,) Estimated coefficients of the regression model
intercept_ float Estimated intercept of the regression model
breakdown_ float Approximate breakdown point of the estimator
n_iter_ int Number of iterations for spatial median computation
n_subpopulation_ int Number of combinations considered for random subsampling

Usage Examples

Basic Usage

from sklearn.linear_model import TheilSenRegressor
from sklearn.datasets import make_regression
import numpy as np

X, y = make_regression(n_samples=100, n_features=5, noise=10, random_state=42)
# Add outliers
y[:10] = np.random.RandomState(42).uniform(-500, 500, size=10)

model = TheilSenRegressor(random_state=42)
model.fit(X, y)
print("Breakdown point:", model.breakdown_)
print("Coefficients:", model.coef_)

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment