Implementation:Cleanlab Cleanlab CleanLearning Predict
| Field | Value |
|---|---|
| Sources | Confident Learning, Cleanlab |
| Domains | Machine_Learning, Data_Quality |
| Last Updated | 2026-02-09 12:00 GMT |
| Type | Wrapper Doc (delegates to wrapped classifier) |
Overview
CleanLearning.predict and CleanLearning.predict_proba provide inference using a model trained on cleaned data, delegating directly to the wrapped sklearn classifier.
Description
The predict and predict_proba methods are thin wrappers that delegate all arguments directly to the underlying classifier's corresponding methods. They do not perform any post-processing, noise correction, or label transformation -- the predictions are exactly what the cleaned classifier produces.
The improvement in prediction quality over a standard classifier comes entirely from the training phase (fit()), which removed detected label issues before fitting the model. At inference time, these methods behave identically to calling predict/predict_proba on the wrapped classifier directly.
Both methods accept *args and **kwargs, which are passed through to the wrapped classifier unchanged. This ensures full compatibility with any sklearn classifier's prediction interface, including classifiers that accept additional parameters at prediction time.
The method requires that fit() has been called first. Calling predict or predict_proba on an unfitted CleanLearning instance will raise an error from the underlying classifier.
Usage
Call predict or predict_proba on a fitted CleanLearning instance, just as with any sklearn classifier.
from cleanlab.classification import CleanLearning
from sklearn.linear_model import LogisticRegression
cl = CleanLearning(clf=LogisticRegression())
cl.fit(X_train, labels=y_train)
# Class label predictions
y_pred = cl.predict(X_test)
# Class probability predictions
y_proba = cl.predict_proba(X_test)
Code Reference
Source Location
- Repository
cleanlab/cleanlab- File
cleanlab/classification.py- Lines
- 584--638
Signature
def predict(self, *args, **kwargs) -> np.ndarray
def predict_proba(self, *args, **kwargs) -> np.ndarray
Import
from cleanlab.classification import CleanLearning
# predict and predict_proba are methods of a CleanLearning instance
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
X |
array-like (N, M) | Yes | Test feature matrix. Must have the same format and number of features as the training data used in fit().
|
*args |
any | No | Additional positional arguments passed through to the wrapped classifier. |
**kwargs |
any | No | Additional keyword arguments passed through to the wrapped classifier. |
Outputs
predict:
| Name | Type | Description |
|---|---|---|
| return value | np.ndarray (N,) | Predicted class labels for each example in the input. |
predict_proba:
| Name | Type | Description |
|---|---|---|
| return value | np.ndarray (N, K) | Predicted class probabilities for each example. Each row sums to 1. Column order matches the classifier's classes_ attribute.
|
Usage Examples
Full Pipeline: Init, Fit, Predict
from cleanlab.classification import CleanLearning
from sklearn.ensemble import GradientBoostingClassifier
from sklearn.metrics import accuracy_score
# Initialize and train
cl = CleanLearning(clf=GradientBoostingClassifier(), seed=42)
cl.fit(X_train, labels=y_train_noisy)
# Predict class labels
y_pred = cl.predict(X_test)
print(f"Accuracy: {accuracy_score(y_test, y_pred):.4f}")
# Predict class probabilities
y_proba = cl.predict_proba(X_test)
print(f"Probability shape: {y_proba.shape}") # (N_test, K)
# Evaluate using score (delegates to wrapped clf)
accuracy = cl.score(X_test, y_test)
print(f"Score: {accuracy:.4f}")
Comparing Clean vs. Noisy Training
from cleanlab.classification import CleanLearning
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score
# Train without cleaning (baseline)
baseline = LogisticRegression()
baseline.fit(X_train, y_train_noisy)
baseline_acc = accuracy_score(y_test, baseline.predict(X_test))
# Train with CleanLearning (noise-robust)
cl = CleanLearning(clf=LogisticRegression())
cl.fit(X_train, labels=y_train_noisy)
clean_acc = accuracy_score(y_test, cl.predict(X_test))
print(f"Baseline accuracy: {baseline_acc:.4f}")
print(f"CleanLearning accuracy: {clean_acc:.4f}")