Principle:Scikit learn Scikit learn Probability Calibration
| Knowledge Sources | |
|---|---|
| Domains | Model Evaluation, Probabilistic Classification |
| Last Updated | 2026-02-08 15:00 GMT |
Overview
Probability calibration adjusts the output of a classifier so that the predicted probabilities accurately reflect the true likelihood of the predicted outcomes.
Description
Many classifiers produce scores or probabilities that do not accurately represent the true probability of class membership. For example, a classifier may predict a probability of 0.8 for positive class, but among all samples predicted as 0.8, only 60% may actually be positive. Probability calibration post-processes these raw outputs to produce well-calibrated probabilities, where predicted confidence aligns with empirical frequency. This is critical for decision-making systems where the magnitude of the predicted probability directly influences actions (e.g., medical diagnosis, risk assessment, threshold-based decisions). Calibration sits in the model evaluation and post-processing pipeline, applied after a classifier has been trained.
Usage
Use probability calibration when the downstream application requires reliable probability estimates rather than just correct rankings. Common scenarios include combining predictions from multiple models (ensemble methods), making threshold-dependent decisions, and communicating uncertainty to users. Use Platt scaling (sigmoid method) when the calibration curve is sigmoid-shaped, which is typical for maximum-margin classifiers like SVMs. Use isotonic regression when the calibration curve has a more complex, non-parametric shape -- but be aware it requires more data to avoid overfitting. Always use cross-validated calibration to prevent overfitting the calibration mapping to the training data.
Theoretical Basis
Calibration Definition: A classifier is perfectly calibrated if:
That is, among all instances where the predicted probability is , the fraction of positive outcomes is exactly .
Calibration Curve (Reliability Diagram): The predicted probabilities are binned, and within each bin, the mean predicted probability is plotted against the observed fraction of positives. A perfectly calibrated classifier produces points along the diagonal.
Platt Scaling (Sigmoid Calibration): Fits a logistic regression model to the classifier's raw outputs:
where is the raw decision function output and parameters and are estimated by minimizing the cross-entropy loss on a held-out calibration set.
Isotonic Regression Calibration: Fits a non-decreasing step function to the relationship between predicted probabilities and observed outcomes:
where is a monotonically non-decreasing function estimated by the isotonic regression algorithm. This is more flexible than Platt scaling but requires more data.
Brier Score measures calibration quality:
The Brier score can be decomposed into calibration (reliability), resolution, and uncertainty components.
Expected Calibration Error (ECE):
where is the set of predictions in bin , is the accuracy, and is the mean predicted probability in that bin.
Cross-validated calibration trains the base classifier on a subset of data and calibrates on the held-out fold, then aggregates the calibrated classifiers, avoiding information leakage.