Heuristic:Cleanlab Cleanlab Confident Threshold Heuristic
| Knowledge Sources | |
|---|---|
| Domains | Confident_Learning, Label_Quality |
| Last Updated | 2026-02-09 19:30 GMT |
Overview
Per-class confidence thresholds (t_j) used in confident joint estimation, computed as the average predicted probability for each class, with a lower bound clip to prevent degenerate behavior.
Description
The confident joint is the core data structure in confident learning. To estimate it, cleanlab needs to decide whether a given example "confidently" belongs to a particular class. This decision uses per-class thresholds: for class j, the threshold t_j is the average of all predicted probabilities P(label=j|x) across all examples x. An example is considered to confidently belong to class j when its predicted probability for j exceeds t_j. When an example exceeds thresholds for multiple classes, the class with the highest predicted probability is chosen. When an example exceeds no threshold, it is treated as an outlier and excluded from the confident joint.
Usage
This heuristic is applied automatically whenever compute_confident_joint or find_label_issues is called without user-supplied thresholds. Understanding this threshold mechanism is important for:
- Interpreting why certain examples are flagged as label issues
- Tuning sensitivity by providing custom thresholds
- Debugging cases where too many or too few issues are found
The Insight (Rule of Thumb)
- Action: The threshold t_j for class j is computed as `t_j = mean(pred_probs[:, j])` across all examples. Thresholds are clipped from below at `2 * 1e-6` (the `CONFIDENT_THRESHOLDS_LOWER_BOUND` constant).
- Value: A floating-point tolerance of `1e-6` is subtracted from thresholds during comparison to handle floating-point imprecision.
- Trade-off: Low thresholds (common for rare classes) make it easier for examples to be "confident" in that class, which can increase false positives. High thresholds (common for dominant classes) are more conservative.
- Edge Case: When zero examples exceed any threshold, the confident joint is initialized as an all-zeros matrix. The diagonal is then clipped to minimum 1 to guarantee at least one correctly labeled example per class.
Reasoning
The per-class average threshold adapts to class imbalance naturally. For a rare class with low average predicted probability, the threshold will be lower, making it easier for the few true members to be identified. For a dominant class, the threshold will be higher, reducing false assignments. This is a key insight from the Confident Learning paper (Northcutt et al., 2021) that makes the method robust across different class distributions.
The floating-point tolerance of `1e-6` prevents edge cases where a predicted probability equals the threshold exactly but is excluded due to floating-point arithmetic. The lower bound clip of `2 * 1e-6` prevents degenerate thresholds near zero.
The guaranteed minimum of 1 on the confident joint diagonal ensures the noise rate estimation remains well-defined even when a class has very poor model predictions.
Code Evidence:
Threshold computation from `cleanlab/count.py:571-574`:
if thresholds is None:
# P(we predict the given noisy label is k | given noisy label is k)
thresholds = get_confident_thresholds(labels, pred_probs, multi_label=multi_label)
thresholds = np.asarray(thresholds)
Threshold application with floating-point tolerance from `cleanlab/count.py:579-594`:
# pred_probs_bool is a bool matrix where each row represents a training
# example as a boolean vector of size num_classes, with True if the
# example confidently belongs to that class and False if not.
pred_probs_bool = pred_probs >= thresholds - 1e-6
num_confident_bins = pred_probs_bool.sum(axis=1)
# The indices where this is false, are often outliers
at_least_one_confident = num_confident_bins > 0
more_than_one_confident = num_confident_bins > 1
# For each example, choose the confident class (greater than threshold)
# When there is 2+ confident classes, choose the class with largest prob.
true_label_guess = np.where(
more_than_one_confident,
pred_probs_argmax,
confident_argmax,
)
Diagonal guarantee from `cleanlab/count.py:611-612`:
# Guarantee at least one correctly labeled example is represented in every class
np.fill_diagonal(confident_joint, confident_joint.diagonal().clip(min=1))
Threshold lower bound constants from `cleanlab/internal/constants.py:1-5`:
FLOATING_POINT_COMPARISON = 1e-6
CLIPPING_LOWER_BOUND = 1e-6
CONFIDENT_THRESHOLDS_LOWER_BOUND = (
2 * FLOATING_POINT_COMPARISON
) # has to be larger than floating point comparison