Implementation:Cleanlab Cleanlab Regression Label Issue Manager
| Knowledge Sources | |
|---|---|
| Domains | Data Quality, Regression |
| Last Updated | 2026-02-09 00:00 GMT |
Overview
RegressionLabelIssueManager detects label issues in regression datasets where the target variable is continuous, flagging examples whose given numeric labels are likely erroneous based on model predictions or feature-based cross-validation.
Description
The RegressionLabelIssueManager class extends IssueManager with issue_name = "label" and supports two detection paths with a defined priority order:
- Custom model + features: If a custom model was provided via
clean_learning_kwargsand features are supplied, the manager delegates tofind_issues_with_features(), which callsCleanLearning.find_label_issues()from the regression variant. This performs cross-validated prediction and identifies outlier residuals. - Predictions-based: If predictions are provided and no custom model is configured, the manager uses
find_issues_with_predictions(), which computes label quality scores viacleanlab.regression.rank.get_label_quality_scores()and flags examples whose scores fall belowthreshold * median_score.
Both paths produce a DataFrame with is_label_issue, label_score, given_label, and predicted_label columns. The given_label and predicted_label columns are moved to the info dictionary and dropped from the issues DataFrame. The summary score is the mean label quality score.
Usage
Use RegressionLabelIssueManager when auditing regression datasets for annotation errors in continuous target values. It is automatically selected by the Datalab framework when the task type is detected as regression. Provide either pre-computed predictions or raw features (with an optional custom regression model) to enable detection.
Code Reference
Source Location
- Repository: Cleanlab
- File: cleanlab/datalab/internal/issue_manager/regression/label.py
- Lines: 1-241
Signature
class RegressionLabelIssueManager(IssueManager):
description: ClassVar[str] = """Examples whose given label is estimated to be potentially incorrect..."""
issue_name: ClassVar[str] = "label"
def __init__(
self,
datalab: Datalab,
clean_learning_kwargs: Optional[Dict[str, Any]] = None,
threshold: float = 0.05,
health_summary_parameters: Optional[Dict[str, Any]] = None,
**_,
): ...
def find_issues(
self,
features: Optional[np.ndarray] = None,
predictions: Optional[np.ndarray] = None,
**kwargs,
) -> None: ...
def collect_info(self, issues: pd.DataFrame) -> dict: ...
def find_issues_with_predictions(
predictions: np.ndarray,
y: np.ndarray,
threshold: float,
**kwargs,
) -> pd.DataFrame: ...
def find_issues_with_features(
features: np.ndarray,
y: np.ndarray,
cl: CleanLearning,
**kwargs,
) -> pd.DataFrame: ...
Import
from cleanlab.datalab.internal.issue_manager.regression.label import RegressionLabelIssueManager
I/O Contract
Inputs (Constructor)
| Name | Type | Required | Description |
|---|---|---|---|
| datalab | Datalab |
Yes | A Datalab instance containing the dataset and its regression labels. |
| clean_learning_kwargs | Optional[Dict[str, Any]] |
No | Keyword arguments passed to the CleanLearning constructor (e.g., a custom model).
|
| threshold | float |
No | Multiplier of the median label quality score used as the absolute threshold for flagging issues. Default is 0.05. |
| health_summary_parameters | Optional[Dict[str, Any]] |
No | Parameters for health summary computation. |
Inputs (find_issues)
| Name | Type | Required | Description |
|---|---|---|---|
| features | Optional[np.ndarray] |
Conditional | Numerical features for the dataset. Required when using a custom model; used with the default model if predictions are not provided. |
| predictions | Optional[np.ndarray] |
Conditional | Pre-computed predictions from a regression model. Used when no custom model is configured. |
Outputs
| Name | Type | Description |
|---|---|---|
| self.issues | pd.DataFrame |
DataFrame with is_label_issue (boolean) and label_score (float between 0 and 1) per example.
|
| self.summary | pd.DataFrame |
Summary DataFrame with the mean label quality score. |
| self.info | dict |
Dictionary containing num_label_issues, average_label_quality, given_label, and predicted_label.
|
Module-Level Helper Functions
find_issues_with_predictions
Computes label quality scores using cleanlab.regression.rank.get_label_quality_scores() and flags examples where score < threshold * median(scores). Accepted kwargs: method. Returns a DataFrame with is_label_issue, label_score, given_label, and predicted_label.
find_issues_with_features
Delegates to CleanLearning.find_label_issues(X, y), which performs cross-validated prediction and outlier detection. Accepted kwargs: uncertainty, coarse_search_range, fine_search_size, save_space, model_kwargs.
Usage Examples
Basic Usage with Predictions
import numpy as np
from cleanlab import Datalab
# Regression dataset with continuous labels
data = {
"feature_a": [1.0, 2.0, 3.0, 4.0, 5.0],
"label": [2.1, 4.0, 6.1, 8.0, 100.0], # last value is a likely annotation error
}
predictions = np.array([2.0, 4.0, 6.0, 8.0, 10.0])
lab = Datalab(data=data, label_name="label", task="regression")
lab.find_issues(pred_probs=predictions)
lab.report()
Usage with Features (Default Model)
import numpy as np
from cleanlab import Datalab
data = {
"feature_a": [1.0, 2.0, 3.0, 4.0, 5.0],
"label": [2.1, 4.0, 6.1, 8.0, 100.0],
}
features = np.array([[1.0], [2.0], [3.0], [4.0], [5.0]])
lab = Datalab(data=data, label_name="label", task="regression")
lab.find_issues(features=features)
lab.report()