Implementation:Cleanlab Cleanlab Estimate Latent
| Knowledge Sources | |
|---|---|
| Domains | Machine_Learning, Data_Quality |
| Last Updated | 2026-02-09 19:00 GMT |
Overview
Concrete tool for deriving latent noise transition matrices and true label priors from a confident joint provided by the Cleanlab library.
Description
This function takes a confident joint matrix and an array of noisy labels and returns three estimated quantities: the latent true label prior py, the noise matrix P(given_label | true_label), and the inverse noise matrix P(true_label | given_label). It supports four different methods for estimating the true prior (py_method) and an optional iterative convergence mode that alternates between refining the noise matrices and the confident joint until estimates stabilize. The function normalizes the confident joint columns and rows to produce valid probability distributions for the noise and inverse noise matrices respectively.
Usage
Import and use this function after computing the confident joint (via compute_confident_joint) when you need to understand the full noise transition structure of your dataset. The noise matrix is useful for understanding systematic annotation errors, the inverse noise matrix is useful for correcting predictions at inference time, and the true prior is useful for understanding class imbalance after correcting for noise.
Code Reference
Source Location
- Repository: cleanlab
- File: cleanlab/count.py
- Lines: 715-796
Signature
def estimate_latent(
confident_joint,
labels,
*,
py_method="cnt",
converge_latent_estimates=False,
) -> Tuple[np.ndarray, np.ndarray, np.ndarray]
Import
from cleanlab.count import estimate_latent
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| confident_joint | np.ndarray | Yes | The confident joint matrix of shape (K, K) as computed by compute_confident_joint. Entry (i, j) estimates the count of examples with given label i and true label j. |
| labels | np.ndarray | Yes | Array of noisy class labels of shape (N,) with integer values in range 0..K-1. Used to compute the empirical label distribution. |
| py_method | str | No | Method for estimating the true label prior. One of "cnt" (default, direct counting from the confident joint), "eqn" (equation-based), "marginal" (marginal distribution), or "marginal_ps" (marginal with prior shift). |
| converge_latent_estimates | bool | No | If True, iteratively re-estimate the confident joint and noise matrices until convergence. Defaults to False. |
Outputs
| Name | Type | Description |
|---|---|---|
| py | np.ndarray | Array of shape (K,) representing the estimated latent prior distribution of true labels. Sums to 1. |
| noise_matrix | np.ndarray | true_label=j). Each column sums to 1. |
| inv_noise_matrix | np.ndarray | given_label=i). Each row sums to 1. |
Usage Examples
Basic Usage
import numpy as np
from cleanlab.count import compute_confident_joint, estimate_latent
labels = np.array([0, 0, 1, 1, 2, 2, 0, 1, 2, 1])
pred_probs = np.array([
[0.9, 0.05, 0.05],
[0.3, 0.6, 0.1],
[0.1, 0.8, 0.1],
[0.05, 0.15, 0.8],
[0.1, 0.1, 0.8],
[0.05, 0.05, 0.9],
[0.85, 0.1, 0.05],
[0.1, 0.7, 0.2],
[0.0, 0.2, 0.8],
[0.15, 0.75, 0.1],
])
# Step 1: Compute the confident joint
cj = compute_confident_joint(labels, pred_probs)
# Step 2: Estimate latent noise matrices
py, noise_matrix, inv_noise_matrix = estimate_latent(cj, labels)
print("True label prior:", py)
print("Noise matrix (P(given|true)):\n", noise_matrix)
print("Inverse noise matrix (P(true|given)):\n", inv_noise_matrix)
With Convergence
from cleanlab.count import compute_confident_joint, estimate_latent
cj = compute_confident_joint(labels, pred_probs)
py, noise_matrix, inv_noise_matrix = estimate_latent(
cj, labels,
py_method="marginal",
converge_latent_estimates=True,
)