Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:Cleanlab Cleanlab CleanLearning Init

From Leeroopedia


Field Value
Sources Confident Learning, Cleanlab
Domains Machine_Learning, Data_Quality
Last Updated 2026-02-09 12:00 GMT

Overview

CleanLearning.__init__ initializes a noise-robust learning wrapper that enhances any scikit-learn compatible classifier with automatic label issue detection and data cleaning.

Description

The CleanLearning.__init__ constructor creates a wrapper around an sklearn-compatible classifier. It stores configuration for the entire confident learning pipeline: cross-validation fold count, label issue detection parameters, quality scoring parameters, and runtime options. The resulting object implements sklearn's BaseEstimator interface, making it a drop-in replacement for any sklearn classifier.

If no classifier is provided, CleanLearning defaults to sklearn.linear_model.LogisticRegression. The constructor validates nothing eagerly -- all validation occurs at fit() time. This lazy initialization pattern allows flexible configuration before committing to a training run.

Key configuration groups:

  • Classifier: The clf parameter accepts any object that implements fit(), predict(), and predict_proba().
  • Cross-validation: cv_n_folds controls the number of folds for out-of-sample prediction estimation. seed ensures reproducibility.
  • Label issue detection: find_label_issues_kwargs is forwarded to cleanlab.filter.find_label_issues, controlling the filter strategy and thresholds.
  • Quality scoring: label_quality_scores_kwargs is forwarded to cleanlab.rank.get_label_quality_scores.
  • Special modes: pulearning enables positive-unlabeled learning; converge_latent_estimates iteratively refines noise matrix estimates.

Usage

Import CleanLearning and instantiate it with your chosen classifier and configuration parameters.

from cleanlab.classification import CleanLearning
from sklearn.ensemble import GradientBoostingClassifier

# Basic initialization with defaults
cl = CleanLearning()

# Custom classifier with configuration
cl = CleanLearning(
    clf=GradientBoostingClassifier(n_estimators=100),
    seed=42,
    cv_n_folds=10,
    find_label_issues_kwargs={"filter_by": "prune_by_noise_rate", "min_examples_per_class": 5},
    label_quality_scores_kwargs={"method": "normalized_margin"},
    verbose=True,
)

Code Reference

Source Location

Repository
cleanlab/cleanlab
File
cleanlab/classification.py
Lines
213--263

Signature

class CleanLearning(BaseEstimator):
    def __init__(
        self,
        clf=None,
        *,
        seed=None,
        cv_n_folds=5,
        converge_latent_estimates=False,
        pulearning=None,
        find_label_issues_kwargs={},
        label_quality_scores_kwargs={},
        verbose=False,
        low_memory=False,
    )

Import

from cleanlab.classification import CleanLearning

I/O Contract

Inputs

Name Type Required Default Description
clf sklearn-compatible estimator No LogisticRegression() Base classifier that implements fit/predict/predict_proba
seed Optional[int] No None Random seed for reproducibility of cross-validation splits
cv_n_folds int No 5 Number of cross-validation folds for out-of-sample pred_probs
converge_latent_estimates bool No False Whether to iteratively refine noise matrix estimates until convergence
pulearning Optional[int] No None Index of the positive class for positive-unlabeled learning
find_label_issues_kwargs dict No {} Keyword arguments forwarded to filter.find_label_issues
label_quality_scores_kwargs dict No {} Keyword arguments forwarded to rank.get_label_quality_scores
verbose bool No False Whether to print informational messages during execution
low_memory bool No False Whether to use a low-memory variant of cross-validation

Outputs

Name Type Description
return value CleanLearning Initialized CleanLearning instance wrapping the provided classifier

Usage Examples

Default Initialization

from cleanlab.classification import CleanLearning

# Uses LogisticRegression as default classifier
cl = CleanLearning()
cl.fit(X_train, labels=y_train)
predictions = cl.predict(X_test)

Custom Classifier with PU Learning

from cleanlab.classification import CleanLearning
from sklearn.svm import SVC

cl = CleanLearning(
    clf=SVC(probability=True),
    pulearning=1,  # Treat class 1 as positive in PU learning
    seed=42,
    cv_n_folds=3,
)

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment