Principle:Cleanlab Cleanlab Clean Model Inference
| Metadata | |
|---|---|
| Sources | Confident Learning, Cleanlab |
| Domains | Machine_Learning, Data_Quality |
| Last Updated | 2026-02-09 12:00 GMT |
Overview
Inference using a model that was trained on cleaned data, providing predictions that are more reliable because the training process excluded detected mislabeled examples.
Description
Clean model inference uses the classifier that was trained via the noise-robust training pipeline. Since the model was fit on data with detected label issues removed, its predictions reflect patterns learned from cleaner data. The predict and predict_proba methods delegate directly to the underlying sklearn classifier but benefit from the improved model quality.
The inference methods provided by CleanLearning include:
predict(X): Returns predicted class labels for the input feature matrix. Delegates directly to the wrapped classifier'spredictmethod.predict_proba(X): Returns predicted class probabilities for each example. Delegates directly to the wrapped classifier'spredict_probamethod. Requires the underlying classifier to support probability estimation.score(X, y, sample_weight=None): Returns the mean accuracy on the given test data and labels. Delegates to the wrapped classifier'sscoremethod via the inheritedBaseEstimatorinterface.
The improvement in prediction quality comes entirely from the training phase, not from the inference step itself. There is no post-processing or correction applied during inference -- the predictions are exactly what the cleaned classifier produces.
Usage
Use after CleanLearning.fit() to make predictions with a model trained on cleaned data. The interface is identical to any sklearn estimator.
from cleanlab.classification import CleanLearning
from sklearn.linear_model import LogisticRegression
cl = CleanLearning(clf=LogisticRegression())
cl.fit(X_train, labels=y_train)
# Predict class labels
predictions = cl.predict(X_test)
# Predict class probabilities
probabilities = cl.predict_proba(X_test)
# Evaluate accuracy
accuracy = cl.score(X_test, y_test)
Theoretical Basis
Delegated inference: The CleanLearning wrapper delegates predict() and predict_proba() calls directly to the wrapped classifier. The improvement comes not from the inference step itself but from the training pipeline that produced the model.
Formally, if the original classifier trained on noisy data produces function f_noisy and the clean-trained classifier produces f_clean, then:
in expectation, because f_clean was trained on a dataset closer to the true label distribution. At inference time, both models use the same prediction mechanism; the difference is entirely in the learned parameters.
This delegation pattern preserves compatibility with all sklearn tooling. Any code that calls .predict() or .predict_proba() on an sklearn estimator works identically with a fitted CleanLearning instance.