Implementation:Scikit learn Scikit learn RANSACRegressor
| Knowledge Sources | |
|---|---|
| Domains | Machine Learning, Robust Regression |
| Last Updated | 2026-02-08 15:00 GMT |
Overview
Concrete tool for robust parameter estimation using the RANSAC (RANdom SAmple Consensus) algorithm provided by scikit-learn.
Description
RANSACRegressor implements the RANSAC algorithm for iteratively estimating model parameters from a subset of inliers. At each iteration, a random subset of samples is selected, a base estimator (default: LinearRegression) is fit on this subset, and the remaining samples are tested against the model to identify inliers based on a residual threshold. The process repeats, keeping the model with the largest consensus set (most inliers). RANSAC is a meta-estimator that wraps any regression estimator supporting fit, score, and predict methods.
Usage
Use RANSACRegressor when your dataset contains significant outliers that would corrupt standard regression methods. It is particularly effective when you know that a substantial fraction of the data consists of inliers that follow the model, while the outliers are arbitrary. Common applications include computer vision (line/plane fitting), sensor data processing, and any regression task with gross outliers.
Code Reference
Source Location
- Repository: scikit-learn
- File: sklearn/linear_model/_ransac.py
Signature
class RANSACRegressor(
MetaEstimatorMixin,
RegressorMixin,
MultiOutputMixin,
BaseEstimator,
):
def __init__(
self,
estimator=None,
*,
min_samples=None,
residual_threshold=None,
is_data_valid=None,
is_model_valid=None,
max_trials=100,
max_skips=np.inf,
stop_n_inliers=np.inf,
stop_score=np.inf,
stop_probability=0.99,
loss="absolute_error",
random_state=None,
):
Import
from sklearn.linear_model import RANSACRegressor
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| estimator | object | No | Base estimator with fit/score/predict methods (default=LinearRegression) |
| min_samples | int or float | No | Minimum random samples for fitting; absolute if >=1, relative if <1 (default=None) |
| residual_threshold | float | No | Maximum residual for inlier classification; defaults to MAD of y (default=None) |
| is_data_valid | callable | No | Function called with random subset to validate data (default=None) |
| is_model_valid | callable | No | Function called with model and random subset to validate model (default=None) |
| max_trials | int | No | Maximum number of random sample iterations (default=100) |
| stop_n_inliers | int | No | Stop if at least this many inliers are found (default=inf) |
| stop_score | float | No | Stop if score is at least this value (default=inf) |
| stop_probability | float | No | Confidence probability for early stopping of trials (default=0.99) |
| loss | str or callable | No | Loss function: 'absolute_error', 'squared_error', or callable (default='absolute_error') |
| random_state | int or RandomState | No | Random seed for reproducibility |
Outputs
| Name | Type | Description |
|---|---|---|
| estimator_ | object | Best fitted base estimator on the final inlier set |
| n_trials_ | int | Number of random selection trials performed |
| inlier_mask_ | ndarray of shape (n_samples,) | Boolean mask of inlier samples |
| n_skips_no_inliers_ | int | Number of iterations skipped due to finding no inliers |
| n_skips_invalid_data_ | int | Number of iterations skipped due to invalid data |
| n_skips_invalid_model_ | int | Number of iterations skipped due to invalid model |
Usage Examples
Basic Usage
from sklearn.linear_model import RANSACRegressor
from sklearn.datasets import make_regression
import numpy as np
X, y = make_regression(n_samples=200, n_features=5, noise=10, random_state=42)
# Add outliers
y[:20] = np.random.RandomState(42).uniform(-500, 500, size=20)
model = RANSACRegressor(random_state=42)
model.fit(X, y)
print("Inliers:", model.inlier_mask_.sum(), "out of", len(y))
print("Trials:", model.n_trials_)