Principle:Scikit learn Scikit learn Probability Calibration

Knowledge Sources	Scikit_learn Scikit-learn Docs
Domains	Model Evaluation, Probabilistic Classification
Last Updated	2026-02-08 15:00 GMT

Overview

Probability calibration adjusts the output of a classifier so that the predicted probabilities accurately reflect the true likelihood of the predicted outcomes.

Description

Many classifiers produce scores or probabilities that do not accurately represent the true probability of class membership. For example, a classifier may predict a probability of 0.8 for positive class, but among all samples predicted as 0.8, only 60% may actually be positive. Probability calibration post-processes these raw outputs to produce well-calibrated probabilities, where predicted confidence aligns with empirical frequency. This is critical for decision-making systems where the magnitude of the predicted probability directly influences actions (e.g., medical diagnosis, risk assessment, threshold-based decisions). Calibration sits in the model evaluation and post-processing pipeline, applied after a classifier has been trained.

Usage

Use probability calibration when the downstream application requires reliable probability estimates rather than just correct rankings. Common scenarios include combining predictions from multiple models (ensemble methods), making threshold-dependent decisions, and communicating uncertainty to users. Use Platt scaling (sigmoid method) when the calibration curve is sigmoid-shaped, which is typical for maximum-margin classifiers like SVMs. Use isotonic regression when the calibration curve has a more complex, non-parametric shape -- but be aware it requires more data to avoid overfitting. Always use cross-validated calibration to prevent overfitting the calibration mapping to the training data.

Theoretical Basis

Calibration Definition: A classifier is perfectly calibrated if:

$P (Y = 1 | \hat{p} (x) = q) = q \forall q \in [0, 1]$

That is, among all instances where the predicted probability is $q$ , the fraction of positive outcomes is exactly $q$ .

Calibration Curve (Reliability Diagram): The predicted probabilities are binned, and within each bin, the mean predicted probability is plotted against the observed fraction of positives. A perfectly calibrated classifier produces points along the diagonal.

Platt Scaling (Sigmoid Calibration): Fits a logistic regression model to the classifier's raw outputs:

$P (y = 1 | f (x)) = \frac{1}{1 + \exp (A \cdot f (x) + B)}$

where $f (x)$ is the raw decision function output and parameters $A$ and $B$ are estimated by minimizing the cross-entropy loss on a held-out calibration set.

Isotonic Regression Calibration: Fits a non-decreasing step function to the relationship between predicted probabilities and observed outcomes:

${\hat{p}}_{cal} (x) = m (f (x))$

where $m$ is a monotonically non-decreasing function estimated by the isotonic regression algorithm. This is more flexible than Platt scaling but requires more data.

Brier Score measures calibration quality:

$BS = \frac{1}{n} \sum_{i = 1}^{n} ({\hat{p}}_{i} - y_{i})^{2}$

The Brier score can be decomposed into calibration (reliability), resolution, and uncertainty components.

Expected Calibration Error (ECE):

$ECE = \sum_{b = 1}^{B} \frac{| B_{b} |}{n} | acc (B_{b}) - conf (B_{b}) |$

where $B_{b}$ is the set of predictions in bin $b$ , $acc (B_{b})$ is the accuracy, and $conf (B_{b})$ is the mean predicted probability in that bin.

Cross-validated calibration trains the base classifier on a subset of data and calibrates on the held-out fold, then aggregates the calibrated classifiers, avoiding information leakage.

Related Pages

Implementation:Scikit_learn_Scikit_learn_CalibratedClassifierCV

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment