Implementation:Cleanlab Cleanlab Estimate Latent

Knowledge Sources	Cleanlab Cleanlab Docs
Domains	Machine_Learning, Data_Quality
Last Updated	2026-02-09 19:00 GMT

Overview

Concrete tool for deriving latent noise transition matrices and true label priors from a confident joint provided by the Cleanlab library.

Description

This function takes a confident joint matrix and an array of noisy labels and returns three estimated quantities: the latent true label prior py, the noise matrix P(given_label | true_label), and the inverse noise matrix P(true_label | given_label). It supports four different methods for estimating the true prior (py_method) and an optional iterative convergence mode that alternates between refining the noise matrices and the confident joint until estimates stabilize. The function normalizes the confident joint columns and rows to produce valid probability distributions for the noise and inverse noise matrices respectively.

Usage

Import and use this function after computing the confident joint (via compute_confident_joint) when you need to understand the full noise transition structure of your dataset. The noise matrix is useful for understanding systematic annotation errors, the inverse noise matrix is useful for correcting predictions at inference time, and the true prior is useful for understanding class imbalance after correcting for noise.

Code Reference

Source Location

Repository: cleanlab
File: cleanlab/count.py
Lines: 715-796

Signature

def estimate_latent(
    confident_joint,
    labels,
    *,
    py_method="cnt",
    converge_latent_estimates=False,
) -> Tuple[np.ndarray, np.ndarray, np.ndarray]

Import

from cleanlab.count import estimate_latent

I/O Contract

Inputs

Name	Type	Required	Description
confident_joint	np.ndarray	Yes	The confident joint matrix of shape (K, K) as computed by compute_confident_joint. Entry (i, j) estimates the count of examples with given label i and true label j.
labels	np.ndarray	Yes	Array of noisy class labels of shape (N,) with integer values in range 0..K-1. Used to compute the empirical label distribution.
py_method	str	No	Method for estimating the true label prior. One of "cnt" (default, direct counting from the confident joint), "eqn" (equation-based), "marginal" (marginal distribution), or "marginal_ps" (marginal with prior shift).
converge_latent_estimates	bool	No	If True, iteratively re-estimate the confident joint and noise matrices until convergence. Defaults to False.

Outputs

Name	Type	Description
py	np.ndarray	Array of shape (K,) representing the estimated latent prior distribution of true labels. Sums to 1.
noise_matrix	np.ndarray	true_label=j). Each column sums to 1.
inv_noise_matrix	np.ndarray	given_label=i). Each row sums to 1.

Usage Examples

Basic Usage

import numpy as np
from cleanlab.count import compute_confident_joint, estimate_latent

labels = np.array([0, 0, 1, 1, 2, 2, 0, 1, 2, 1])
pred_probs = np.array([
    [0.9, 0.05, 0.05],
    [0.3, 0.6, 0.1],
    [0.1, 0.8, 0.1],
    [0.05, 0.15, 0.8],
    [0.1, 0.1, 0.8],
    [0.05, 0.05, 0.9],
    [0.85, 0.1, 0.05],
    [0.1, 0.7, 0.2],
    [0.0, 0.2, 0.8],
    [0.15, 0.75, 0.1],
])

# Step 1: Compute the confident joint
cj = compute_confident_joint(labels, pred_probs)

# Step 2: Estimate latent noise matrices
py, noise_matrix, inv_noise_matrix = estimate_latent(cj, labels)

print("True label prior:", py)
print("Noise matrix (P(given|true)):\n", noise_matrix)
print("Inverse noise matrix (P(true|given)):\n", inv_noise_matrix)

With Convergence

from cleanlab.count import compute_confident_joint, estimate_latent

cj = compute_confident_joint(labels, pred_probs)

py, noise_matrix, inv_noise_matrix = estimate_latent(
    cj, labels,
    py_method="marginal",
    converge_latent_estimates=True,
)

Related Pages

Implements Principle

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment