Implementation:Cleanlab Cleanlab CleanLearning Init

Field	Value
Sources	Confident Learning, Cleanlab
Domains	Machine_Learning, Data_Quality
Last Updated	2026-02-09 12:00 GMT

Overview

CleanLearning.__init__ initializes a noise-robust learning wrapper that enhances any scikit-learn compatible classifier with automatic label issue detection and data cleaning.

Description

The CleanLearning.__init__ constructor creates a wrapper around an sklearn-compatible classifier. It stores configuration for the entire confident learning pipeline: cross-validation fold count, label issue detection parameters, quality scoring parameters, and runtime options. The resulting object implements sklearn's BaseEstimator interface, making it a drop-in replacement for any sklearn classifier.

If no classifier is provided, CleanLearning defaults to sklearn.linear_model.LogisticRegression. The constructor validates nothing eagerly -- all validation occurs at fit() time. This lazy initialization pattern allows flexible configuration before committing to a training run.

Key configuration groups:

Classifier: The clf parameter accepts any object that implements fit(), predict(), and predict_proba().
Cross-validation: cv_n_folds controls the number of folds for out-of-sample prediction estimation. seed ensures reproducibility.
Label issue detection: find_label_issues_kwargs is forwarded to cleanlab.filter.find_label_issues, controlling the filter strategy and thresholds.
Quality scoring: label_quality_scores_kwargs is forwarded to cleanlab.rank.get_label_quality_scores.
Special modes: pulearning enables positive-unlabeled learning; converge_latent_estimates iteratively refines noise matrix estimates.

Usage

Import CleanLearning and instantiate it with your chosen classifier and configuration parameters.

from cleanlab.classification import CleanLearning
from sklearn.ensemble import GradientBoostingClassifier

# Basic initialization with defaults
cl = CleanLearning()

# Custom classifier with configuration
cl = CleanLearning(
    clf=GradientBoostingClassifier(n_estimators=100),
    seed=42,
    cv_n_folds=10,
    find_label_issues_kwargs={"filter_by": "prune_by_noise_rate", "min_examples_per_class": 5},
    label_quality_scores_kwargs={"method": "normalized_margin"},
    verbose=True,
)

Code Reference

Source Location

Repository: cleanlab/cleanlab
File: cleanlab/classification.py
Lines: 213--263

Signature

class CleanLearning(BaseEstimator):
    def __init__(
        self,
        clf=None,
        *,
        seed=None,
        cv_n_folds=5,
        converge_latent_estimates=False,
        pulearning=None,
        find_label_issues_kwargs={},
        label_quality_scores_kwargs={},
        verbose=False,
        low_memory=False,
    )

Import

from cleanlab.classification import CleanLearning

I/O Contract

Inputs

Name	Type	Required	Default	Description
`clf`	sklearn-compatible estimator	No	`LogisticRegression()`	Base classifier that implements fit/predict/predict_proba
`seed`	Optional[int]	No	`None`	Random seed for reproducibility of cross-validation splits
`cv_n_folds`	int	No	`5`	Number of cross-validation folds for out-of-sample pred_probs
`converge_latent_estimates`	bool	No	`False`	Whether to iteratively refine noise matrix estimates until convergence
`pulearning`	Optional[int]	No	`None`	Index of the positive class for positive-unlabeled learning
`find_label_issues_kwargs`	dict	No	`{}`	Keyword arguments forwarded to `filter.find_label_issues`
`label_quality_scores_kwargs`	dict	No	`{}`	Keyword arguments forwarded to `rank.get_label_quality_scores`
`verbose`	bool	No	`False`	Whether to print informational messages during execution
`low_memory`	bool	No	`False`	Whether to use a low-memory variant of cross-validation

Outputs

Name	Type	Description
return value	`CleanLearning`	Initialized CleanLearning instance wrapping the provided classifier

Usage Examples

Default Initialization

from cleanlab.classification import CleanLearning

# Uses LogisticRegression as default classifier
cl = CleanLearning()
cl.fit(X_train, labels=y_train)
predictions = cl.predict(X_test)

Custom Classifier with PU Learning

from cleanlab.classification import CleanLearning
from sklearn.svm import SVC

cl = CleanLearning(
    clf=SVC(probability=True),
    pulearning=1,  # Treat class 1 as positive in PU learning
    seed=42,
    cv_n_folds=3,
)

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment