Implementation:Cleanlab Cleanlab Regression Get Label Quality Scores

Knowledge Sources	Cleanlab
Domains	Data Quality, Machine Learning, Regression
Last Updated	2026-02-09 00:00 GMT

Overview

Computes a label quality score for each example in a regression dataset, ranking which Y-values are most likely erroneous.

Description

get_label_quality_scores is the primary public function in the regression ranking module. It accepts raw labels and model predictions for a regression dataset and returns a per-example quality score between 0 and 1, where lower scores indicate labels more likely to be incorrect. The function dispatches to one of two internal scoring methods based on the method parameter: the "residual" method, which computes an exponentially decayed score from absolute residuals, or the "outre" method (the default), which normalizes labels and residuals into a 2D feature space and uses k-nearest-neighbor-based outlier detection via cleanlab's OutOfDistribution scorer to identify anomalous label-residual combinations.

Usage

Import this function when you have a trained regression model and want to identify which examples in your dataset most likely have erroneous Y-values. It is especially useful as a standalone scoring utility when you do not need the full train-prune-retrain cycle provided by CleanLearning. For best results, pass out-of-sample predictions obtained via cross-validation.

Code Reference

Source Location

Repository: Cleanlab
File: cleanlab/regression/rank.py
Lines: 22-87

Signature

def get_label_quality_scores(
    labels: ArrayLike,
    predictions: ArrayLike,
    *,
    method: str = "outre",
) -> np.ndarray:

Import

from cleanlab.regression.rank import get_label_quality_scores

I/O Contract

Inputs

Name	Type	Required	Description
labels	ArrayLike	Yes	1D array of shape (N,) containing the given Y-value labels for each example in the dataset.
predictions	ArrayLike	Yes	1D array of shape (N,) containing the predicted label for each example. Should be out-of-sample predictions from a trained regression model, ideally obtained via cross-validation.
method	str	No	Scoring method to use. Options are "residual" (exponential decay of absolute residuals) or "outre" (default; outlier detection in normalized label-residual space using k-nearest neighbors).

Outputs

Name	Type	Description
label_quality_scores	np.ndarray	Array of shape (N,) with scores between 0 and 1. Lower scores indicate examples more likely to contain a label error. A score near 1 means the label is likely correct; a score near 0 means the label is likely incorrect.

Internal Scoring Methods

Residual Method

Computes exp(-|predictions - labels|) for each example. This produces scores that decay exponentially with the magnitude of the residual. Works well for datasets where independent variables follow a normal distribution.

OUTRE Method (Default)

The OUTRE (OUTlier-in-REsidual-space) method performs the following steps:

Normalize labels to zero mean and unit variance.
Compute residuals (predictions - labels), normalize them, and scale by a factor of 5.
Combine normalized labels and scaled residuals into a 2D feature matrix.
Build a k-nearest-neighbors graph with k = 50% of the dataset size.
Use cleanlab's OutOfDistribution scorer on the 2D features to produce per-example outlier scores.

This method is the recommended default because it considers neighborhood context in the label-residual space rather than relying solely on raw residual magnitude.

Usage Examples

Basic Usage

import numpy as np
from cleanlab.regression.rank import get_label_quality_scores

labels = np.array([1, 2, 3, 4])
predictions = np.array([2, 2, 5, 4.1])

# Using the default OUTRE method
label_quality_scores = get_label_quality_scores(labels, predictions)
print(label_quality_scores)
# array([0.00323821, 0.33692597, 0.00191686, 0.33692597])

Using the Residual Method

import numpy as np
from cleanlab.regression.rank import get_label_quality_scores

labels = np.array([1, 2, 3, 4])
predictions = np.array([2, 2, 5, 4.1])

scores = get_label_quality_scores(labels, predictions, method="residual")
print(scores)
# Scores based on exp(-|residual|), higher for smaller residuals

Related Pages

Principle:Cleanlab_Cleanlab_Regression_Label_Quality_Scoring

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment