Principle:Rapidsai Cuml Ranking Evaluation

Knowledge Sources	Fawcett 2006 - An introduction to ROC analysis Davis and Goadrich 2006 - The Relationship Between Precision-Recall and ROC Curves cuML Docs
Domains	Machine_Learning, Classification, Evaluation
Last Updated	2026-02-08 12:00 GMT

Overview

Ranking evaluation is the assessment of how well a classifier's continuous confidence scores rank positive instances above negative instances, measured through threshold-independent metrics such as ROC AUC and precision-recall curves.

Description

Many classifiers produce continuous-valued scores (probabilities, decision function values, or confidence estimates) rather than hard labels. Ranking evaluation metrics assess the quality of these scores without committing to a specific classification threshold. This is critical because the optimal threshold depends on the application's cost structure, and a good ranking model can be adapted to many thresholds.

ROC Curve and AUC: The Receiver Operating Characteristic (ROC) curve plots the True Positive Rate (TPR, also called recall or sensitivity) against the False Positive Rate (FPR, also called 1 - specificity) at every possible classification threshold. Each point on the curve corresponds to a threshold: moving the threshold lower increases both TPR and FPR. The Area Under the ROC Curve (AUC) summarizes the entire curve into a single scalar. An AUC of 1.0 indicates a perfect classifier that ranks all positives above all negatives; an AUC of 0.5 indicates performance no better than random ordering.

The ROC AUC has a probabilistic interpretation: it is the probability that a randomly chosen positive instance is scored higher than a randomly chosen negative instance. This makes it a natural measure of ranking quality.

Precision-Recall Curve: The precision-recall (PR) curve plots precision (the fraction of predicted positives that are truly positive) against recall (the fraction of actual positives that are correctly identified) at varying thresholds. The PR curve is particularly informative for imbalanced datasets where the positive class is rare. In such settings, the ROC curve can appear optimistic because large numbers of true negatives inflate the TPR, whereas the PR curve focuses entirely on the positive class.

The area under the PR curve (Average Precision or PR AUC) summarizes ranking quality from the perspective of the positive class. A random classifier achieves a PR AUC equal to the prevalence of the positive class, so PR AUC is more discriminating than ROC AUC in imbalanced scenarios.

Usage

Ranking evaluation metrics are used when:

The classifier outputs continuous scores and the operating threshold has not yet been chosen.
Comparing classifiers independently of threshold selection.
The application involves ranked retrieval (e.g., information retrieval, recommendation systems, anomaly detection).
The dataset is imbalanced: prefer precision-recall curves and PR AUC over ROC AUC, as PR metrics are more sensitive to performance differences on the minority class.
A single summary statistic is needed for model selection: ROC AUC provides a threshold-independent measure, while Average Precision emphasizes positive-class ranking.

Theoretical Basis

True Positive Rate and False Positive Rate at threshold t:

$TPR (t) = \frac{| {i : {\hat{s}}_{i} \geq t \land y_{i} = 1} |}{| {i : y_{i} = 1} |}$

$FPR (t) = \frac{| {i : {\hat{s}}_{i} \geq t \land y_{i} = 0} |}{| {i : y_{i} = 0} |}$

where ${\hat{s}}_{i}$ is the predicted score for sample i and $y_{i}$ is the true binary label.

ROC AUC:

$AUC = \int_{0}^{1} TPR ({FPR}^{- 1} (x)) d x$

Equivalently, using the Mann-Whitney U-statistic:

$AUC = \frac{\sum_{i : y_{i} = 1} \sum_{j : y_{j} = 0} 𝟏 [{\hat{s}}_{i} > {\hat{s}}_{j}]}{| {i : y_{i} = 1} | \cdot | {j : y_{j} = 0} |}$

Precision and Recall at threshold t:

$Precision (t) = \frac{| {i : {\hat{s}}_{i} \geq t \land y_{i} = 1} |}{| {i : {\hat{s}}_{i} \geq t} |}$

$Recall (t) = TPR (t)$

Average Precision (Area under PR Curve):

$AP = \sum_{k} (R_{k} - R_{k - 1}) \cdot P_{k}$

where $P_{k}$ and $R_{k}$ are the precision and recall at the k-th threshold (sorted by decreasing score).

GPU Computation:

Given arrays y_true (binary labels) and y_score (continuous scores) of length n:

ROC Curve:
    1. Sort samples by y_score descending           (GPU sort)
    2. Walk through sorted list, tracking cumulative TP and FP:
        For each unique threshold t:
            TPR = cumulative_TP / total_positives
            FPR = cumulative_FP / total_negatives
            Record (FPR, TPR) as a point on the ROC curve

ROC AUC (trapezoidal rule):
    AUC = sum over consecutive points (FPR_{k+1} - FPR_k) * (TPR_{k+1} + TPR_k) / 2

Precision-Recall Curve:
    1. Sort samples by y_score descending           (GPU sort)
    2. Walk through sorted list:
        For each unique threshold t:
            Precision = cumulative_TP / (cumulative_TP + cumulative_FP)
            Recall = cumulative_TP / total_positives
            Record (Recall, Precision)

Average Precision:
    AP = sum over consecutive points (Recall_k - Recall_{k-1}) * Precision_k

Related Pages

Implemented By

Implementation:Rapidsai_Cuml_Ranking_Metrics

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment