Implementation:Cleanlab Cleanlab Regression Label Issue Manager

Knowledge Sources	Cleanlab
Domains	Data Quality, Regression
Last Updated	2026-02-09 00:00 GMT

Overview

RegressionLabelIssueManager detects label issues in regression datasets where the target variable is continuous, flagging examples whose given numeric labels are likely erroneous based on model predictions or feature-based cross-validation.

Description

The RegressionLabelIssueManager class extends IssueManager with issue_name = "label" and supports two detection paths with a defined priority order:

Custom model + features: If a custom model was provided via clean_learning_kwargs and features are supplied, the manager delegates to find_issues_with_features(), which calls CleanLearning.find_label_issues() from the regression variant. This performs cross-validated prediction and identifies outlier residuals.
Predictions-based: If predictions are provided and no custom model is configured, the manager uses find_issues_with_predictions(), which computes label quality scores via cleanlab.regression.rank.get_label_quality_scores() and flags examples whose scores fall below threshold * median_score.

Both paths produce a DataFrame with is_label_issue, label_score, given_label, and predicted_label columns. The given_label and predicted_label columns are moved to the info dictionary and dropped from the issues DataFrame. The summary score is the mean label quality score.

Usage

Use RegressionLabelIssueManager when auditing regression datasets for annotation errors in continuous target values. It is automatically selected by the Datalab framework when the task type is detected as regression. Provide either pre-computed predictions or raw features (with an optional custom regression model) to enable detection.

Code Reference

Source Location

Repository: Cleanlab
File: cleanlab/datalab/internal/issue_manager/regression/label.py
Lines: 1-241

Signature

class RegressionLabelIssueManager(IssueManager):
    description: ClassVar[str] = """Examples whose given label is estimated to be potentially incorrect..."""
    issue_name: ClassVar[str] = "label"

    def __init__(
        self,
        datalab: Datalab,
        clean_learning_kwargs: Optional[Dict[str, Any]] = None,
        threshold: float = 0.05,
        health_summary_parameters: Optional[Dict[str, Any]] = None,
        **_,
    ): ...

    def find_issues(
        self,
        features: Optional[np.ndarray] = None,
        predictions: Optional[np.ndarray] = None,
        **kwargs,
    ) -> None: ...

    def collect_info(self, issues: pd.DataFrame) -> dict: ...


def find_issues_with_predictions(
    predictions: np.ndarray,
    y: np.ndarray,
    threshold: float,
    **kwargs,
) -> pd.DataFrame: ...


def find_issues_with_features(
    features: np.ndarray,
    y: np.ndarray,
    cl: CleanLearning,
    **kwargs,
) -> pd.DataFrame: ...

Import

from cleanlab.datalab.internal.issue_manager.regression.label import RegressionLabelIssueManager

I/O Contract

Inputs (Constructor)

Name	Type	Required	Description
datalab	`Datalab`	Yes	A Datalab instance containing the dataset and its regression labels.
clean_learning_kwargs	`Optional[Dict[str, Any]]`	No	Keyword arguments passed to the `CleanLearning` constructor (e.g., a custom `model`).
threshold	`float`	No	Multiplier of the median label quality score used as the absolute threshold for flagging issues. Default is 0.05.
health_summary_parameters	`Optional[Dict[str, Any]]`	No	Parameters for health summary computation.

Inputs (find_issues)

Name	Type	Required	Description
features	`Optional[np.ndarray]`	Conditional	Numerical features for the dataset. Required when using a custom model; used with the default model if predictions are not provided.
predictions	`Optional[np.ndarray]`	Conditional	Pre-computed predictions from a regression model. Used when no custom model is configured.

Outputs

Name	Type	Description
self.issues	`pd.DataFrame`	DataFrame with `is_label_issue` (boolean) and `label_score` (float between 0 and 1) per example.
self.summary	`pd.DataFrame`	Summary DataFrame with the mean label quality score.
self.info	`dict`	Dictionary containing `num_label_issues`, `average_label_quality`, `given_label`, and `predicted_label`.

Module-Level Helper Functions

find_issues_with_predictions

Computes label quality scores using cleanlab.regression.rank.get_label_quality_scores() and flags examples where score < threshold * median(scores). Accepted kwargs: method. Returns a DataFrame with is_label_issue, label_score, given_label, and predicted_label.

find_issues_with_features

Delegates to CleanLearning.find_label_issues(X, y), which performs cross-validated prediction and outlier detection. Accepted kwargs: uncertainty, coarse_search_range, fine_search_size, save_space, model_kwargs.

Usage Examples

Basic Usage with Predictions

import numpy as np
from cleanlab import Datalab

# Regression dataset with continuous labels
data = {
    "feature_a": [1.0, 2.0, 3.0, 4.0, 5.0],
    "label": [2.1, 4.0, 6.1, 8.0, 100.0],  # last value is a likely annotation error
}

predictions = np.array([2.0, 4.0, 6.0, 8.0, 10.0])

lab = Datalab(data=data, label_name="label", task="regression")
lab.find_issues(pred_probs=predictions)
lab.report()

Usage with Features (Default Model)

import numpy as np
from cleanlab import Datalab

data = {
    "feature_a": [1.0, 2.0, 3.0, 4.0, 5.0],
    "label": [2.1, 4.0, 6.1, 8.0, 100.0],
}
features = np.array([[1.0], [2.0], [3.0], [4.0], [5.0]])

lab = Datalab(data=data, label_name="label", task="regression")
lab.find_issues(features=features)
lab.report()

Related Pages

Principle:Cleanlab_Cleanlab_Datalab_Regression_Label_Issue_Detection

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment