Implementation:Cleanlab Cleanlab Compute Confident Joint
| Knowledge Sources | |
|---|---|
| Domains | Machine_Learning, Data_Quality |
| Last Updated | 2026-02-09 19:00 GMT |
Overview
Concrete tool for estimating the joint distribution of noisy given labels and true labels provided by the Cleanlab library.
Description
This function computes the confident joint matrix, a K x K array where each entry C[i][j] estimates the number of examples in the dataset whose given label is class i and whose true label is class j. It first computes per-class confidence thresholds (the average predicted probability for each class among examples labeled as that class), then assigns each example a "confident" true label based on which class probabilities exceed their respective thresholds. The matrix is optionally calibrated so that row sums match the observed label counts. It also supports multi-label classification and can return the indices of off-diagonal elements (examples where given and true labels differ).
Usage
Import and use this function when you need to understand the noise structure of your labeled dataset. The confident joint is the foundational object used by estimate_latent to derive noise matrices, by find_label_issues to determine per-class error counts, and by health_summary to compute dataset-level quality metrics.
Code Reference
Source Location
- Repository: cleanlab
- File: cleanlab/count.py
- Lines: 445-622
Signature
def compute_confident_joint(
labels,
pred_probs,
*,
thresholds=None,
calibrate=True,
multi_label=False,
return_indices_of_off_diagonals=False,
) -> Union[np.ndarray, Tuple[np.ndarray, list]]
Import
from cleanlab.count import compute_confident_joint
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| labels | LabelLike | Yes | Array of noisy class labels of shape (N,) with integer values in range 0..K-1. |
| pred_probs | np.ndarray | Yes | Out-of-sample predicted probability matrix of shape (N, K). Each row sums to 1. |
| thresholds | Optional[np.ndarray] | No | Per-class confidence thresholds of shape (K,). If None, computed as the average pred_probs for each class among examples labeled as that class. |
| calibrate | bool | No | If True (default), calibrate the confident joint so row sums match the empirical label distribution. |
| multi_label | bool | No | If True, treat the problem as multi-label classification. Defaults to False. |
| return_indices_of_off_diagonals | bool | No | If True, also return a list of indices of examples where the confident true label differs from the given label. Defaults to False. |
Outputs
| Name | Type | Description |
|---|---|---|
| confident_joint | np.ndarray | Array of shape (K, K) representing the estimated joint counts of (given_label, true_label) pairs. |
| indices (optional) | list | Returned only when return_indices_of_off_diagonals=True. List of arrays, one per off-diagonal entry, containing indices of examples in that (given, true) pair. |
Usage Examples
Basic Usage
import numpy as np
from cleanlab.count import compute_confident_joint
# Suppose we have 4 examples with 3 classes
labels = np.array([0, 0, 1, 2])
pred_probs = np.array([
[0.9, 0.05, 0.05], # confidently class 0
[0.3, 0.6, 0.1], # labeled 0 but model thinks class 1
[0.1, 0.8, 0.1], # confidently class 1
[0.05, 0.1, 0.85], # confidently class 2
])
cj = compute_confident_joint(labels, pred_probs, calibrate=True)
print(cj)
# K x K matrix showing estimated (given_label, true_label) counts
Retrieving Off-Diagonal Indices
from cleanlab.count import compute_confident_joint
cj, off_diag_indices = compute_confident_joint(
labels, pred_probs,
return_indices_of_off_diagonals=True,
)
# off_diag_indices contains the indices of potentially mislabeled examples
for idx_list in off_diag_indices:
if len(idx_list) > 0:
print("Potentially mislabeled example indices:", idx_list)