Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:Cleanlab Cleanlab Compute Confident Joint

From Leeroopedia


Knowledge Sources
Domains Machine_Learning, Data_Quality
Last Updated 2026-02-09 19:00 GMT

Overview

Concrete tool for estimating the joint distribution of noisy given labels and true labels provided by the Cleanlab library.

Description

This function computes the confident joint matrix, a K x K array where each entry C[i][j] estimates the number of examples in the dataset whose given label is class i and whose true label is class j. It first computes per-class confidence thresholds (the average predicted probability for each class among examples labeled as that class), then assigns each example a "confident" true label based on which class probabilities exceed their respective thresholds. The matrix is optionally calibrated so that row sums match the observed label counts. It also supports multi-label classification and can return the indices of off-diagonal elements (examples where given and true labels differ).

Usage

Import and use this function when you need to understand the noise structure of your labeled dataset. The confident joint is the foundational object used by estimate_latent to derive noise matrices, by find_label_issues to determine per-class error counts, and by health_summary to compute dataset-level quality metrics.

Code Reference

Source Location

  • Repository: cleanlab
  • File: cleanlab/count.py
  • Lines: 445-622

Signature

def compute_confident_joint(
    labels,
    pred_probs,
    *,
    thresholds=None,
    calibrate=True,
    multi_label=False,
    return_indices_of_off_diagonals=False,
) -> Union[np.ndarray, Tuple[np.ndarray, list]]

Import

from cleanlab.count import compute_confident_joint

I/O Contract

Inputs

Name Type Required Description
labels LabelLike Yes Array of noisy class labels of shape (N,) with integer values in range 0..K-1.
pred_probs np.ndarray Yes Out-of-sample predicted probability matrix of shape (N, K). Each row sums to 1.
thresholds Optional[np.ndarray] No Per-class confidence thresholds of shape (K,). If None, computed as the average pred_probs for each class among examples labeled as that class.
calibrate bool No If True (default), calibrate the confident joint so row sums match the empirical label distribution.
multi_label bool No If True, treat the problem as multi-label classification. Defaults to False.
return_indices_of_off_diagonals bool No If True, also return a list of indices of examples where the confident true label differs from the given label. Defaults to False.

Outputs

Name Type Description
confident_joint np.ndarray Array of shape (K, K) representing the estimated joint counts of (given_label, true_label) pairs.
indices (optional) list Returned only when return_indices_of_off_diagonals=True. List of arrays, one per off-diagonal entry, containing indices of examples in that (given, true) pair.

Usage Examples

Basic Usage

import numpy as np
from cleanlab.count import compute_confident_joint

# Suppose we have 4 examples with 3 classes
labels = np.array([0, 0, 1, 2])
pred_probs = np.array([
    [0.9, 0.05, 0.05],  # confidently class 0
    [0.3, 0.6, 0.1],    # labeled 0 but model thinks class 1
    [0.1, 0.8, 0.1],    # confidently class 1
    [0.05, 0.1, 0.85],  # confidently class 2
])

cj = compute_confident_joint(labels, pred_probs, calibrate=True)
print(cj)
# K x K matrix showing estimated (given_label, true_label) counts

Retrieving Off-Diagonal Indices

from cleanlab.count import compute_confident_joint

cj, off_diag_indices = compute_confident_joint(
    labels, pred_probs,
    return_indices_of_off_diagonals=True,
)

# off_diag_indices contains the indices of potentially mislabeled examples
for idx_list in off_diag_indices:
    if len(idx_list) > 0:
        print("Potentially mislabeled example indices:", idx_list)

Related Pages

Implements Principle

Requires Environment

Uses Heuristic

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment