Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:Cleanlab Cleanlab CleanLearning Predict

From Leeroopedia


Field Value
Sources Confident Learning, Cleanlab
Domains Machine_Learning, Data_Quality
Last Updated 2026-02-09 12:00 GMT
Type Wrapper Doc (delegates to wrapped classifier)

Overview

CleanLearning.predict and CleanLearning.predict_proba provide inference using a model trained on cleaned data, delegating directly to the wrapped sklearn classifier.

Description

The predict and predict_proba methods are thin wrappers that delegate all arguments directly to the underlying classifier's corresponding methods. They do not perform any post-processing, noise correction, or label transformation -- the predictions are exactly what the cleaned classifier produces.

The improvement in prediction quality over a standard classifier comes entirely from the training phase (fit()), which removed detected label issues before fitting the model. At inference time, these methods behave identically to calling predict/predict_proba on the wrapped classifier directly.

Both methods accept *args and **kwargs, which are passed through to the wrapped classifier unchanged. This ensures full compatibility with any sklearn classifier's prediction interface, including classifiers that accept additional parameters at prediction time.

The method requires that fit() has been called first. Calling predict or predict_proba on an unfitted CleanLearning instance will raise an error from the underlying classifier.

Usage

Call predict or predict_proba on a fitted CleanLearning instance, just as with any sklearn classifier.

from cleanlab.classification import CleanLearning
from sklearn.linear_model import LogisticRegression

cl = CleanLearning(clf=LogisticRegression())
cl.fit(X_train, labels=y_train)

# Class label predictions
y_pred = cl.predict(X_test)

# Class probability predictions
y_proba = cl.predict_proba(X_test)

Code Reference

Source Location

Repository
cleanlab/cleanlab
File
cleanlab/classification.py
Lines
584--638

Signature

def predict(self, *args, **kwargs) -> np.ndarray

def predict_proba(self, *args, **kwargs) -> np.ndarray

Import

from cleanlab.classification import CleanLearning
# predict and predict_proba are methods of a CleanLearning instance

I/O Contract

Inputs

Name Type Required Description
X array-like (N, M) Yes Test feature matrix. Must have the same format and number of features as the training data used in fit().
*args any No Additional positional arguments passed through to the wrapped classifier.
**kwargs any No Additional keyword arguments passed through to the wrapped classifier.

Outputs

predict:

Name Type Description
return value np.ndarray (N,) Predicted class labels for each example in the input.

predict_proba:

Name Type Description
return value np.ndarray (N, K) Predicted class probabilities for each example. Each row sums to 1. Column order matches the classifier's classes_ attribute.

Usage Examples

Full Pipeline: Init, Fit, Predict

from cleanlab.classification import CleanLearning
from sklearn.ensemble import GradientBoostingClassifier
from sklearn.metrics import accuracy_score

# Initialize and train
cl = CleanLearning(clf=GradientBoostingClassifier(), seed=42)
cl.fit(X_train, labels=y_train_noisy)

# Predict class labels
y_pred = cl.predict(X_test)
print(f"Accuracy: {accuracy_score(y_test, y_pred):.4f}")

# Predict class probabilities
y_proba = cl.predict_proba(X_test)
print(f"Probability shape: {y_proba.shape}")  # (N_test, K)

# Evaluate using score (delegates to wrapped clf)
accuracy = cl.score(X_test, y_test)
print(f"Score: {accuracy:.4f}")

Comparing Clean vs. Noisy Training

from cleanlab.classification import CleanLearning
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score

# Train without cleaning (baseline)
baseline = LogisticRegression()
baseline.fit(X_train, y_train_noisy)
baseline_acc = accuracy_score(y_test, baseline.predict(X_test))

# Train with CleanLearning (noise-robust)
cl = CleanLearning(clf=LogisticRegression())
cl.fit(X_train, labels=y_train_noisy)
clean_acc = accuracy_score(y_test, cl.predict(X_test))

print(f"Baseline accuracy: {baseline_acc:.4f}")
print(f"CleanLearning accuracy: {clean_acc:.4f}")

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment