Principle:Cleanlab Cleanlab Clean Model Inference

Metadata
Sources	Confident Learning, Cleanlab
Domains	Machine_Learning, Data_Quality
Last Updated	2026-02-09 12:00 GMT

Overview

Inference using a model that was trained on cleaned data, providing predictions that are more reliable because the training process excluded detected mislabeled examples.

Description

Clean model inference uses the classifier that was trained via the noise-robust training pipeline. Since the model was fit on data with detected label issues removed, its predictions reflect patterns learned from cleaner data. The predict and predict_proba methods delegate directly to the underlying sklearn classifier but benefit from the improved model quality.

The inference methods provided by CleanLearning include:

predict(X): Returns predicted class labels for the input feature matrix. Delegates directly to the wrapped classifier's predict method.
predict_proba(X): Returns predicted class probabilities for each example. Delegates directly to the wrapped classifier's predict_proba method. Requires the underlying classifier to support probability estimation.
score(X, y, sample_weight=None): Returns the mean accuracy on the given test data and labels. Delegates to the wrapped classifier's score method via the inherited BaseEstimator interface.

The improvement in prediction quality comes entirely from the training phase, not from the inference step itself. There is no post-processing or correction applied during inference -- the predictions are exactly what the cleaned classifier produces.

Usage

Use after CleanLearning.fit() to make predictions with a model trained on cleaned data. The interface is identical to any sklearn estimator.

from cleanlab.classification import CleanLearning
from sklearn.linear_model import LogisticRegression

cl = CleanLearning(clf=LogisticRegression())
cl.fit(X_train, labels=y_train)

# Predict class labels
predictions = cl.predict(X_test)

# Predict class probabilities
probabilities = cl.predict_proba(X_test)

# Evaluate accuracy
accuracy = cl.score(X_test, y_test)

Theoretical Basis

Delegated inference: The CleanLearning wrapper delegates predict() and predict_proba() calls directly to the wrapped classifier. The improvement comes not from the inference step itself but from the training pipeline that produced the model.

Formally, if the original classifier trained on noisy data produces function f_noisy and the clean-trained classifier produces f_clean, then:

$R (f_{c l e a n}) \leq R (f_{n o i s y})$

in expectation, because f_clean was trained on a dataset closer to the true label distribution. At inference time, both models use the same prediction mechanism; the difference is entirely in the learned parameters.

This delegation pattern preserves compatibility with all sklearn tooling. Any code that calls .predict() or .predict_proba() on an sklearn estimator works identically with a fitted CleanLearning instance.

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment