Principle:Scikit learn Scikit learn Isotonic Regression
| Knowledge Sources | |
|---|---|
| Domains | Supervised Learning, Non-Parametric Methods |
| Last Updated | 2026-02-08 15:00 GMT |
Overview
Isotonic regression fits a non-decreasing (or non-increasing) piecewise constant function to data, providing a non-parametric regression method with a monotonicity constraint.
Description
Isotonic regression finds the best-fitting monotonic function to observed data without assuming a specific parametric form. It solves the problem of fitting a regression function when the only assumption is that the relationship between the predictor and the response is monotonic. The resulting fit is a step function that minimizes the sum of squared errors subject to the ordering constraint. Isotonic regression is widely used in probability calibration (transforming classifier outputs into calibrated probabilities), dose-response modeling, and any setting where domain knowledge dictates a monotonic relationship.
Usage
Use isotonic regression when the relationship between the predictor and the response is known or expected to be monotonic but the exact functional form is unknown. It is commonly used as a calibration method for classifiers, where predicted scores should have a monotonic relationship with true probabilities. It is also useful in medical dose-response studies, pricing models, and quality control. Note that isotonic regression is limited to univariate problems (one predictor variable) and can overfit when the dataset is small, as it has high flexibility with no smoothness constraint.
Theoretical Basis
Problem Formulation: Given data pairs with , isotonic regression solves:
where are optional sample weights.
Pool Adjacent Violators (PAV) Algorithm:
- Start with .
- Scan from left to right. If (violation of monotonicity):
- Merge the two adjacent blocks by replacing both values with their weighted average.
- Check backward for further violations and merge as needed.
- Continue until no violations remain.
The PAV algorithm has time complexity and produces the exact global optimum.
Properties:
- The solution is a piecewise constant (step) function.
- It is the projection of the data onto the cone of monotone sequences in the weighted least-squares sense.
- The number of steps (plateaus) in the solution is at most and depends on the data.
For prediction at new points: Linear interpolation (or the step function value) is used between the fitted values at training points. Extrapolation beyond the range uses the boundary values.
Monotonicity direction: The constraint can be non-decreasing () or non-increasing (), depending on the known direction of the relationship.
Application to calibration: When used for probability calibration, the predicted scores serve as the predictor and the true binary labels serve as the response. The fitted isotonic function maps raw scores to calibrated probabilities while preserving the ranking.