Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Cleanlab Cleanlab Regression Get Label Quality Scores

From Leeroopedia


Knowledge Sources
Domains Data Quality, Machine Learning, Regression
Last Updated 2026-02-09 00:00 GMT

Overview

Computes a label quality score for each example in a regression dataset, ranking which Y-values are most likely erroneous.

Description

get_label_quality_scores is the primary public function in the regression ranking module. It accepts raw labels and model predictions for a regression dataset and returns a per-example quality score between 0 and 1, where lower scores indicate labels more likely to be incorrect. The function dispatches to one of two internal scoring methods based on the method parameter: the "residual" method, which computes an exponentially decayed score from absolute residuals, or the "outre" method (the default), which normalizes labels and residuals into a 2D feature space and uses k-nearest-neighbor-based outlier detection via cleanlab's OutOfDistribution scorer to identify anomalous label-residual combinations.

Usage

Import this function when you have a trained regression model and want to identify which examples in your dataset most likely have erroneous Y-values. It is especially useful as a standalone scoring utility when you do not need the full train-prune-retrain cycle provided by CleanLearning. For best results, pass out-of-sample predictions obtained via cross-validation.

Code Reference

Source Location

  • Repository: Cleanlab
  • File: cleanlab/regression/rank.py
  • Lines: 22-87

Signature

def get_label_quality_scores(
    labels: ArrayLike,
    predictions: ArrayLike,
    *,
    method: str = "outre",
) -> np.ndarray:

Import

from cleanlab.regression.rank import get_label_quality_scores

I/O Contract

Inputs

Name Type Required Description
labels ArrayLike Yes 1D array of shape (N,) containing the given Y-value labels for each example in the dataset.
predictions ArrayLike Yes 1D array of shape (N,) containing the predicted label for each example. Should be out-of-sample predictions from a trained regression model, ideally obtained via cross-validation.
method str No Scoring method to use. Options are "residual" (exponential decay of absolute residuals) or "outre" (default; outlier detection in normalized label-residual space using k-nearest neighbors).

Outputs

Name Type Description
label_quality_scores np.ndarray Array of shape (N,) with scores between 0 and 1. Lower scores indicate examples more likely to contain a label error. A score near 1 means the label is likely correct; a score near 0 means the label is likely incorrect.

Internal Scoring Methods

Residual Method

Computes exp(-|predictions - labels|) for each example. This produces scores that decay exponentially with the magnitude of the residual. Works well for datasets where independent variables follow a normal distribution.

OUTRE Method (Default)

The OUTRE (OUTlier-in-REsidual-space) method performs the following steps:

  1. Normalize labels to zero mean and unit variance.
  2. Compute residuals (predictions - labels), normalize them, and scale by a factor of 5.
  3. Combine normalized labels and scaled residuals into a 2D feature matrix.
  4. Build a k-nearest-neighbors graph with k = 50% of the dataset size.
  5. Use cleanlab's OutOfDistribution scorer on the 2D features to produce per-example outlier scores.

This method is the recommended default because it considers neighborhood context in the label-residual space rather than relying solely on raw residual magnitude.

Usage Examples

Basic Usage

import numpy as np
from cleanlab.regression.rank import get_label_quality_scores

labels = np.array([1, 2, 3, 4])
predictions = np.array([2, 2, 5, 4.1])

# Using the default OUTRE method
label_quality_scores = get_label_quality_scores(labels, predictions)
print(label_quality_scores)
# array([0.00323821, 0.33692597, 0.00191686, 0.33692597])

Using the Residual Method

import numpy as np
from cleanlab.regression.rank import get_label_quality_scores

labels = np.array([1, 2, 3, 4])
predictions = np.array([2, 2, 5, 4.1])

scores = get_label_quality_scores(labels, predictions, method="residual")
print(scores)
# Scores based on exp(-|residual|), higher for smaller residuals

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment