Implementation:Cleanlab Cleanlab CleanLearning Init
| Field | Value |
|---|---|
| Sources | Confident Learning, Cleanlab |
| Domains | Machine_Learning, Data_Quality |
| Last Updated | 2026-02-09 12:00 GMT |
Overview
CleanLearning.__init__ initializes a noise-robust learning wrapper that enhances any scikit-learn compatible classifier with automatic label issue detection and data cleaning.
Description
The CleanLearning.__init__ constructor creates a wrapper around an sklearn-compatible classifier. It stores configuration for the entire confident learning pipeline: cross-validation fold count, label issue detection parameters, quality scoring parameters, and runtime options. The resulting object implements sklearn's BaseEstimator interface, making it a drop-in replacement for any sklearn classifier.
If no classifier is provided, CleanLearning defaults to sklearn.linear_model.LogisticRegression. The constructor validates nothing eagerly -- all validation occurs at fit() time. This lazy initialization pattern allows flexible configuration before committing to a training run.
Key configuration groups:
- Classifier: The
clfparameter accepts any object that implementsfit(),predict(), andpredict_proba(). - Cross-validation:
cv_n_foldscontrols the number of folds for out-of-sample prediction estimation.seedensures reproducibility. - Label issue detection:
find_label_issues_kwargsis forwarded tocleanlab.filter.find_label_issues, controlling the filter strategy and thresholds. - Quality scoring:
label_quality_scores_kwargsis forwarded tocleanlab.rank.get_label_quality_scores. - Special modes:
pulearningenables positive-unlabeled learning;converge_latent_estimatesiteratively refines noise matrix estimates.
Usage
Import CleanLearning and instantiate it with your chosen classifier and configuration parameters.
from cleanlab.classification import CleanLearning
from sklearn.ensemble import GradientBoostingClassifier
# Basic initialization with defaults
cl = CleanLearning()
# Custom classifier with configuration
cl = CleanLearning(
clf=GradientBoostingClassifier(n_estimators=100),
seed=42,
cv_n_folds=10,
find_label_issues_kwargs={"filter_by": "prune_by_noise_rate", "min_examples_per_class": 5},
label_quality_scores_kwargs={"method": "normalized_margin"},
verbose=True,
)
Code Reference
Source Location
- Repository
cleanlab/cleanlab- File
cleanlab/classification.py- Lines
- 213--263
Signature
class CleanLearning(BaseEstimator):
def __init__(
self,
clf=None,
*,
seed=None,
cv_n_folds=5,
converge_latent_estimates=False,
pulearning=None,
find_label_issues_kwargs={},
label_quality_scores_kwargs={},
verbose=False,
low_memory=False,
)
Import
from cleanlab.classification import CleanLearning
I/O Contract
Inputs
| Name | Type | Required | Default | Description |
|---|---|---|---|---|
clf |
sklearn-compatible estimator | No | LogisticRegression() |
Base classifier that implements fit/predict/predict_proba |
seed |
Optional[int] | No | None |
Random seed for reproducibility of cross-validation splits |
cv_n_folds |
int | No | 5 |
Number of cross-validation folds for out-of-sample pred_probs |
converge_latent_estimates |
bool | No | False |
Whether to iteratively refine noise matrix estimates until convergence |
pulearning |
Optional[int] | No | None |
Index of the positive class for positive-unlabeled learning |
find_label_issues_kwargs |
dict | No | {} |
Keyword arguments forwarded to filter.find_label_issues
|
label_quality_scores_kwargs |
dict | No | {} |
Keyword arguments forwarded to rank.get_label_quality_scores
|
verbose |
bool | No | False |
Whether to print informational messages during execution |
low_memory |
bool | No | False |
Whether to use a low-memory variant of cross-validation |
Outputs
| Name | Type | Description |
|---|---|---|
| return value | CleanLearning |
Initialized CleanLearning instance wrapping the provided classifier |
Usage Examples
Default Initialization
from cleanlab.classification import CleanLearning
# Uses LogisticRegression as default classifier
cl = CleanLearning()
cl.fit(X_train, labels=y_train)
predictions = cl.predict(X_test)
Custom Classifier with PU Learning
from cleanlab.classification import CleanLearning
from sklearn.svm import SVC
cl = CleanLearning(
clf=SVC(probability=True),
pulearning=1, # Treat class 1 as positive in PU learning
seed=42,
cv_n_folds=3,
)