Principle:DistrictDataLabs Yellowbrick Precision Recall Analysis
| Knowledge Sources | |
|---|---|
| Domains | Machine_Learning, Classification, Model_Evaluation |
| Last Updated | 2026-02-08 00:00 GMT |
Overview
Precision-recall analysis is a technique for evaluating classifier quality by plotting the tradeoff between precision (the fraction of positive predictions that are correct) and recall (the fraction of actual positives that are detected) across all decision thresholds.
Description
The precision-recall (PR) curve is a diagnostic tool that visualizes how precision and recall change as the classifier's decision threshold varies. Unlike the ROC curve, which plots true positive rate against false positive rate, the PR curve focuses specifically on the positive class. This makes it particularly well-suited for imbalanced datasets where the positive class is rare and the ROC curve might give an overly optimistic view of classifier performance due to the large number of true negatives.
The PR curve plots recall on the horizontal axis and precision on the vertical axis. A perfect classifier occupies the top-right corner (precision = 1, recall = 1). The area under the PR curve, known as average precision (AP), provides a single scalar summary of the curve, computed as the weighted mean of precisions at each threshold, with the increase in recall from the previous threshold used as the weight. A higher average precision indicates better classifier quality.
For multiclass problems, PR curves can be computed on a per-class basis using a one-vs-rest strategy, or as a micro-averaged curve that pools predictions across all classes. ISO F1 curves, which are contour lines representing constant F1 score values on the precision-recall plane, can be overlaid to provide additional context about the tradeoff between precision and recall at different operating points.
Usage
Use precision-recall analysis when evaluating binary or multiclass classifiers, especially when the dataset is imbalanced and the cost of false positives differs from the cost of false negatives. It is more informative than ROC AUC when the positive class is rare. Use it to compare models, select thresholds, or communicate the precision-recall tradeoff to stakeholders.
Theoretical Basis
At a given decision threshold , the classifier labels instances with scores above as positive. Precision and recall are then:
The average precision (AP) summarizes the PR curve as a weighted mean of precisions at each threshold:
where and are the precision and recall at the -th threshold, and the thresholds are ordered by decreasing recall.
The F1 score at any operating point on the PR curve is:
ISO F1 curves are defined by the relationship:
For a fixed value, this traces a curve on the precision-recall plane where all points achieve the same F1 score. These curves form hyperbolic arcs connecting the precision and recall axes.
For the micro-averaged PR curve, predictions across all classes are pooled before computing precision and recall, giving equal weight to each instance rather than each class.