Principle:Scikit learn Scikit learn Isotonic Regression

Knowledge Sources	Scikit_learn Scikit-learn Docs
Domains	Supervised Learning, Non-Parametric Methods
Last Updated	2026-02-08 15:00 GMT

Overview

Isotonic regression fits a non-decreasing (or non-increasing) piecewise constant function to data, providing a non-parametric regression method with a monotonicity constraint.

Description

Isotonic regression finds the best-fitting monotonic function to observed data without assuming a specific parametric form. It solves the problem of fitting a regression function when the only assumption is that the relationship between the predictor and the response is monotonic. The resulting fit is a step function that minimizes the sum of squared errors subject to the ordering constraint. Isotonic regression is widely used in probability calibration (transforming classifier outputs into calibrated probabilities), dose-response modeling, and any setting where domain knowledge dictates a monotonic relationship.

Usage

Use isotonic regression when the relationship between the predictor and the response is known or expected to be monotonic but the exact functional form is unknown. It is commonly used as a calibration method for classifiers, where predicted scores should have a monotonic relationship with true probabilities. It is also useful in medical dose-response studies, pricing models, and quality control. Note that isotonic regression is limited to univariate problems (one predictor variable) and can overfit when the dataset is small, as it has high flexibility with no smoothness constraint.

Theoretical Basis

Problem Formulation: Given data pairs $(x_{1}, y_{1}), \dots, (x_{n}, y_{n})$ with $x_{1} \leq x_{2} \leq \dots \leq x_{n}$ , isotonic regression solves:

$\hat{y} = \arg \min_{z} \sum_{i = 1}^{n} w_{i} (y_{i} - z_{i})^{2} s.t. z_{1} \leq z_{2} \leq \dots \leq z_{n}$

where $w_{i}$ are optional sample weights.

Pool Adjacent Violators (PAV) Algorithm:

Start with $z_{i} = y_{i}$ .
Scan from left to right. If zi>zi+1 (violation of monotonicity):
1. Merge the two adjacent blocks by replacing both values with their weighted average.
2. Check backward for further violations and merge as needed.
Continue until no violations remain.

The PAV algorithm has time complexity $O (n)$ and produces the exact global optimum.

Properties:

The solution is a piecewise constant (step) function.
It is the projection of the data onto the cone of monotone sequences in the weighted least-squares sense.
The number of steps (plateaus) in the solution is at most $n$ and depends on the data.

For prediction at new points: Linear interpolation (or the step function value) is used between the fitted values at training points. Extrapolation beyond the range uses the boundary values.

Monotonicity direction: The constraint can be non-decreasing ( $z_{i} \leq z_{i + 1}$ ) or non-increasing ( $z_{i} \geq z_{i + 1}$ ), depending on the known direction of the relationship.

Application to calibration: When used for probability calibration, the predicted scores $f (x)$ serve as the predictor and the true binary labels $y$ serve as the response. The fitted isotonic function maps raw scores to calibrated probabilities while preserving the ranking.

Related Pages

Implementation:Scikit_learn_Scikit_learn_IsotonicRegression

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment