Principle:Cleanlab Cleanlab Datalab Regression Label Issue Detection

Knowledge Sources	Cleanlab
Domains	Data Quality, Regression
Last Updated	2026-02-09 00:00 GMT

Overview

Regression label issue detection identifies examples in a regression dataset whose continuous target values are likely incorrect, using residual analysis and quality scoring to flag annotation errors in numeric labels.

Description

Detecting label errors in regression tasks is fundamentally different from classification. In classification, an incorrect label is clearly wrong (the example belongs to class A but is labeled class B). In regression, all values lie on a continuous spectrum, so the question becomes: "Is this label value an outlier relative to what the model expects?"

Regression label issue detection addresses this by comparing each example's given label to the model's prediction and assessing how unusual the discrepancy is. Examples with abnormally large residuals -- values that the model consistently cannot predict well -- are flagged as potential label errors.

This is important because annotation errors in regression targets are common when:

Human annotators estimate continuous quantities (e.g., age, price, distance).
Data entry errors introduce typos in numeric fields (e.g., 100.0 instead of 10.0).
Measurement instruments produce occasional erroneous readings.
Data merging or transformation pipelines introduce systematic errors.

Usage

Apply regression label issue detection when:

Your dataset has continuous target values and you suspect annotation or measurement errors.
You have either pre-computed model predictions or raw features available.
You want to prioritize examples for manual review based on likelihood of label error.
You need to estimate overall label quality in a regression dataset before model deployment.

Theoretical Basis

The detection framework supports two complementary approaches:

Approach 1: Prediction-Based Detection

Given pre-computed predictions from a regression model:

Step 1 -- Label quality scoring: Compute a quality score for each example using get_label_quality_scores(labels, predictions). This function quantifies how well the given label agrees with the model's prediction, producing a score between 0 and 1 where higher values indicate better agreement.

Step 2 -- Threshold-based flagging: Flag an example as a label issue if its quality score falls below a threshold defined as:

threshold_absolute = threshold * median(quality_scores)

where threshold is a configurable multiplier (default 0.05). The use of the median as a reference point makes the detection robust to the overall distribution of residuals. With the default threshold of 0.05, an example is flagged only if its quality score is less than 5% of the median score, indicating an extreme outlier.

Approach 2: Feature-Based Detection (CleanLearning)

Given raw numerical features and labels:

Step 1 -- Cross-validated prediction: A regression model (default or custom) is trained using k-fold cross-validation. Each example receives an out-of-fold prediction, ensuring the model has not seen the example during its prediction.

Step 2 -- Outlier identification: The CleanLearning.find_label_issues() method analyzes the residuals between out-of-fold predictions and given labels to identify examples with unusually large errors. This approach is more robust than single-model predictions because cross-validation prevents overfitting from masking label errors.

Configurable parameters for this approach include:

uncertainty: Controls how much uncertainty is tolerated in label quality estimation.
coarse_search_range / fine_search_size: Control the grid search for optimal detection parameters.

Priority Logic

When both features and predictions are available, the manager follows this priority:

If a custom model is configured and features are provided, use the feature-based approach.
If predictions are provided and no custom model is configured, use the prediction-based approach.
If only features are provided (no custom model, no predictions), use the feature-based approach with the default model.

Summary Scoring

The overall dataset quality is summarized as the mean label quality score:

dataset_score = (1/n) * sum(score_i for i in 1..n)

A lower mean score indicates more widespread label quality issues in the regression dataset.

Related Pages

Implementation:Cleanlab_Cleanlab_Regression_Label_Issue_Manager

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment