Implementation:Cleanlab Cleanlab CleanLearning Predict

Field	Value
Sources	Confident Learning, Cleanlab
Domains	Machine_Learning, Data_Quality
Last Updated	2026-02-09 12:00 GMT
Type	Wrapper Doc (delegates to wrapped classifier)

Overview

CleanLearning.predict and CleanLearning.predict_proba provide inference using a model trained on cleaned data, delegating directly to the wrapped sklearn classifier.

Description

The predict and predict_proba methods are thin wrappers that delegate all arguments directly to the underlying classifier's corresponding methods. They do not perform any post-processing, noise correction, or label transformation -- the predictions are exactly what the cleaned classifier produces.

The improvement in prediction quality over a standard classifier comes entirely from the training phase (fit()), which removed detected label issues before fitting the model. At inference time, these methods behave identically to calling predict/predict_proba on the wrapped classifier directly.

Both methods accept *args and **kwargs, which are passed through to the wrapped classifier unchanged. This ensures full compatibility with any sklearn classifier's prediction interface, including classifiers that accept additional parameters at prediction time.

The method requires that fit() has been called first. Calling predict or predict_proba on an unfitted CleanLearning instance will raise an error from the underlying classifier.

Usage

Call predict or predict_proba on a fitted CleanLearning instance, just as with any sklearn classifier.

from cleanlab.classification import CleanLearning
from sklearn.linear_model import LogisticRegression

cl = CleanLearning(clf=LogisticRegression())
cl.fit(X_train, labels=y_train)

# Class label predictions
y_pred = cl.predict(X_test)

# Class probability predictions
y_proba = cl.predict_proba(X_test)

Code Reference

Source Location

Repository: cleanlab/cleanlab
File: cleanlab/classification.py
Lines: 584--638

Signature

def predict(self, *args, **kwargs) -> np.ndarray

def predict_proba(self, *args, **kwargs) -> np.ndarray

Import

from cleanlab.classification import CleanLearning
# predict and predict_proba are methods of a CleanLearning instance

I/O Contract

Inputs

Name	Type	Required	Description
`X`	array-like (N, M)	Yes	Test feature matrix. Must have the same format and number of features as the training data used in `fit()`.
`*args`	any	No	Additional positional arguments passed through to the wrapped classifier.
`**kwargs`	any	No	Additional keyword arguments passed through to the wrapped classifier.

Outputs

predict:

Name	Type	Description
return value	np.ndarray (N,)	Predicted class labels for each example in the input.

predict_proba:

Name	Type	Description
return value	np.ndarray (N, K)	Predicted class probabilities for each example. Each row sums to 1. Column order matches the classifier's `classes_` attribute.

Usage Examples

Full Pipeline: Init, Fit, Predict

from cleanlab.classification import CleanLearning
from sklearn.ensemble import GradientBoostingClassifier
from sklearn.metrics import accuracy_score

# Initialize and train
cl = CleanLearning(clf=GradientBoostingClassifier(), seed=42)
cl.fit(X_train, labels=y_train_noisy)

# Predict class labels
y_pred = cl.predict(X_test)
print(f"Accuracy: {accuracy_score(y_test, y_pred):.4f}")

# Predict class probabilities
y_proba = cl.predict_proba(X_test)
print(f"Probability shape: {y_proba.shape}")  # (N_test, K)

# Evaluate using score (delegates to wrapped clf)
accuracy = cl.score(X_test, y_test)
print(f"Score: {accuracy:.4f}")

Comparing Clean vs. Noisy Training

from cleanlab.classification import CleanLearning
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score

# Train without cleaning (baseline)
baseline = LogisticRegression()
baseline.fit(X_train, y_train_noisy)
baseline_acc = accuracy_score(y_test, baseline.predict(X_test))

# Train with CleanLearning (noise-robust)
cl = CleanLearning(clf=LogisticRegression())
cl.fit(X_train, labels=y_train_noisy)
clean_acc = accuracy_score(y_test, cl.predict(X_test))

print(f"Baseline accuracy: {baseline_acc:.4f}")
print(f"CleanLearning accuracy: {clean_acc:.4f}")

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment