Implementation:Cleanlab Cleanlab Compute Confident Joint

Knowledge Sources	Cleanlab Cleanlab Docs
Domains	Machine_Learning, Data_Quality
Last Updated	2026-02-09 19:00 GMT

Overview

Concrete tool for estimating the joint distribution of noisy given labels and true labels provided by the Cleanlab library.

Description

This function computes the confident joint matrix, a K x K array where each entry C[i][j] estimates the number of examples in the dataset whose given label is class i and whose true label is class j. It first computes per-class confidence thresholds (the average predicted probability for each class among examples labeled as that class), then assigns each example a "confident" true label based on which class probabilities exceed their respective thresholds. The matrix is optionally calibrated so that row sums match the observed label counts. It also supports multi-label classification and can return the indices of off-diagonal elements (examples where given and true labels differ).

Usage

Import and use this function when you need to understand the noise structure of your labeled dataset. The confident joint is the foundational object used by estimate_latent to derive noise matrices, by find_label_issues to determine per-class error counts, and by health_summary to compute dataset-level quality metrics.

Code Reference

Source Location

Repository: cleanlab
File: cleanlab/count.py
Lines: 445-622

Signature

def compute_confident_joint(
    labels,
    pred_probs,
    *,
    thresholds=None,
    calibrate=True,
    multi_label=False,
    return_indices_of_off_diagonals=False,
) -> Union[np.ndarray, Tuple[np.ndarray, list]]

Import

from cleanlab.count import compute_confident_joint

I/O Contract

Inputs

Name	Type	Required	Description
labels	LabelLike	Yes	Array of noisy class labels of shape (N,) with integer values in range 0..K-1.
pred_probs	np.ndarray	Yes	Out-of-sample predicted probability matrix of shape (N, K). Each row sums to 1.
thresholds	Optional[np.ndarray]	No	Per-class confidence thresholds of shape (K,). If None, computed as the average pred_probs for each class among examples labeled as that class.
calibrate	bool	No	If True (default), calibrate the confident joint so row sums match the empirical label distribution.
multi_label	bool	No	If True, treat the problem as multi-label classification. Defaults to False.
return_indices_of_off_diagonals	bool	No	If True, also return a list of indices of examples where the confident true label differs from the given label. Defaults to False.

Outputs

Name	Type	Description
confident_joint	np.ndarray	Array of shape (K, K) representing the estimated joint counts of (given_label, true_label) pairs.
indices (optional)	list	Returned only when return_indices_of_off_diagonals=True. List of arrays, one per off-diagonal entry, containing indices of examples in that (given, true) pair.

Usage Examples

Basic Usage

import numpy as np
from cleanlab.count import compute_confident_joint

# Suppose we have 4 examples with 3 classes
labels = np.array([0, 0, 1, 2])
pred_probs = np.array([
    [0.9, 0.05, 0.05],  # confidently class 0
    [0.3, 0.6, 0.1],    # labeled 0 but model thinks class 1
    [0.1, 0.8, 0.1],    # confidently class 1
    [0.05, 0.1, 0.85],  # confidently class 2
])

cj = compute_confident_joint(labels, pred_probs, calibrate=True)
print(cj)
# K x K matrix showing estimated (given_label, true_label) counts

Retrieving Off-Diagonal Indices

from cleanlab.count import compute_confident_joint

cj, off_diag_indices = compute_confident_joint(
    labels, pred_probs,
    return_indices_of_off_diagonals=True,
)

# off_diag_indices contains the indices of potentially mislabeled examples
for idx_list in off_diag_indices:
    if len(idx_list) > 0:
        print("Potentially mislabeled example indices:", idx_list)

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment