Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:Cleanlab Cleanlab Regression Label Issue Manager

From Leeroopedia


Knowledge Sources
Domains Data Quality, Regression
Last Updated 2026-02-09 00:00 GMT

Overview

RegressionLabelIssueManager detects label issues in regression datasets where the target variable is continuous, flagging examples whose given numeric labels are likely erroneous based on model predictions or feature-based cross-validation.

Description

The RegressionLabelIssueManager class extends IssueManager with issue_name = "label" and supports two detection paths with a defined priority order:

  1. Custom model + features: If a custom model was provided via clean_learning_kwargs and features are supplied, the manager delegates to find_issues_with_features(), which calls CleanLearning.find_label_issues() from the regression variant. This performs cross-validated prediction and identifies outlier residuals.
  2. Predictions-based: If predictions are provided and no custom model is configured, the manager uses find_issues_with_predictions(), which computes label quality scores via cleanlab.regression.rank.get_label_quality_scores() and flags examples whose scores fall below threshold * median_score.

Both paths produce a DataFrame with is_label_issue, label_score, given_label, and predicted_label columns. The given_label and predicted_label columns are moved to the info dictionary and dropped from the issues DataFrame. The summary score is the mean label quality score.

Usage

Use RegressionLabelIssueManager when auditing regression datasets for annotation errors in continuous target values. It is automatically selected by the Datalab framework when the task type is detected as regression. Provide either pre-computed predictions or raw features (with an optional custom regression model) to enable detection.

Code Reference

Source Location

  • Repository: Cleanlab
  • File: cleanlab/datalab/internal/issue_manager/regression/label.py
  • Lines: 1-241

Signature

class RegressionLabelIssueManager(IssueManager):
    description: ClassVar[str] = """Examples whose given label is estimated to be potentially incorrect..."""
    issue_name: ClassVar[str] = "label"

    def __init__(
        self,
        datalab: Datalab,
        clean_learning_kwargs: Optional[Dict[str, Any]] = None,
        threshold: float = 0.05,
        health_summary_parameters: Optional[Dict[str, Any]] = None,
        **_,
    ): ...

    def find_issues(
        self,
        features: Optional[np.ndarray] = None,
        predictions: Optional[np.ndarray] = None,
        **kwargs,
    ) -> None: ...

    def collect_info(self, issues: pd.DataFrame) -> dict: ...


def find_issues_with_predictions(
    predictions: np.ndarray,
    y: np.ndarray,
    threshold: float,
    **kwargs,
) -> pd.DataFrame: ...


def find_issues_with_features(
    features: np.ndarray,
    y: np.ndarray,
    cl: CleanLearning,
    **kwargs,
) -> pd.DataFrame: ...

Import

from cleanlab.datalab.internal.issue_manager.regression.label import RegressionLabelIssueManager

I/O Contract

Inputs (Constructor)

Name Type Required Description
datalab Datalab Yes A Datalab instance containing the dataset and its regression labels.
clean_learning_kwargs Optional[Dict[str, Any]] No Keyword arguments passed to the CleanLearning constructor (e.g., a custom model).
threshold float No Multiplier of the median label quality score used as the absolute threshold for flagging issues. Default is 0.05.
health_summary_parameters Optional[Dict[str, Any]] No Parameters for health summary computation.

Inputs (find_issues)

Name Type Required Description
features Optional[np.ndarray] Conditional Numerical features for the dataset. Required when using a custom model; used with the default model if predictions are not provided.
predictions Optional[np.ndarray] Conditional Pre-computed predictions from a regression model. Used when no custom model is configured.

Outputs

Name Type Description
self.issues pd.DataFrame DataFrame with is_label_issue (boolean) and label_score (float between 0 and 1) per example.
self.summary pd.DataFrame Summary DataFrame with the mean label quality score.
self.info dict Dictionary containing num_label_issues, average_label_quality, given_label, and predicted_label.

Module-Level Helper Functions

find_issues_with_predictions

Computes label quality scores using cleanlab.regression.rank.get_label_quality_scores() and flags examples where score < threshold * median(scores). Accepted kwargs: method. Returns a DataFrame with is_label_issue, label_score, given_label, and predicted_label.

find_issues_with_features

Delegates to CleanLearning.find_label_issues(X, y), which performs cross-validated prediction and outlier detection. Accepted kwargs: uncertainty, coarse_search_range, fine_search_size, save_space, model_kwargs.

Usage Examples

Basic Usage with Predictions

import numpy as np
from cleanlab import Datalab

# Regression dataset with continuous labels
data = {
    "feature_a": [1.0, 2.0, 3.0, 4.0, 5.0],
    "label": [2.1, 4.0, 6.1, 8.0, 100.0],  # last value is a likely annotation error
}

predictions = np.array([2.0, 4.0, 6.0, 8.0, 10.0])

lab = Datalab(data=data, label_name="label", task="regression")
lab.find_issues(pred_probs=predictions)
lab.report()

Usage with Features (Default Model)

import numpy as np
from cleanlab import Datalab

data = {
    "feature_a": [1.0, 2.0, 3.0, 4.0, 5.0],
    "label": [2.1, 4.0, 6.1, 8.0, 100.0],
}
features = np.array([[1.0], [2.0], [3.0], [4.0], [5.0]])

lab = Datalab(data=data, label_name="label", task="regression")
lab.find_issues(features=features)
lab.report()

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment