Implementation:Cleanlab Cleanlab Span Find Label Issues

Knowledge Sources	Cleanlab
Domains	Natural Language Processing, Span Classification, Noisy Labels
Last Updated	2026-02-09 00:00 GMT

Overview

Identifies tokens with label issues in span classification datasets by adapting cleanlab's token classification pipeline to handle span-specific probability formats.

Description

The span_classification module provides an adapter layer that enables cleanlab's token classification label issue detection to work with span classification data. In span classification, each token in a sentence receives a binary label indicating whether it belongs to a span or not, and model predictions are single probabilities per token rather than multi-class probability vectors. The module's key function, find_label_issues, converts these single-probability predictions into the two-column format expected by the token classification module and then delegates to the underlying token_classification.filter.find_label_issues function. The module also provides get_label_quality_scores for ranking sentences by overall label quality and display_issues for visualization.

Usage

Import this module when working with span classification datasets (e.g., named entity recognition with a single entity type) where each token has a binary span/non-span label and you want to identify tokens that are likely mislabeled. Currently only supports a single span class.

Code Reference

Source Location

Repository: Cleanlab
File: cleanlab/experimental/span_classification.py
Lines: 17-62 (find_label_issues), 89-98 (get_label_quality_scores), 101-106 (_get_pred_prob_token)

Signature

def find_label_issues(
    labels: list,
    pred_probs: list,
) -> list:

def get_label_quality_scores(
    labels: list,
    pred_probs: list,
    **kwargs,
) -> Tuple[np.ndarray, list]:

def display_issues(
    issues: list,
    tokens: List[List[str]],
    *,
    labels: Optional[list] = None,
    pred_probs: Optional[list] = None,
    exclude: List[Tuple[int, int]] = [],
    class_names: Optional[List[str]] = None,
    top: int = 20,
) -> None:

Import

from cleanlab.experimental.span_classification import find_label_issues
from cleanlab.experimental.span_classification import get_label_quality_scores
from cleanlab.experimental.span_classification import display_issues

I/O Contract

Inputs (find_label_issues)

Name	Type	Required	Description
labels	list	Yes	Nested list of given labels for all tokens across sentences. Each inner list contains integer labels (0 or 1) for each token in a sentence, where 1 indicates the token is part of a span.
pred_probs	list	Yes	List of 1D numpy arrays, where each array contains per-token probabilities of belonging to a span. Each array has shape (T,) for T tokens in that sentence, with values between 0 and 1.

Outputs (find_label_issues)

Name	Type	Description
issues	list of tuples	List of (i, j) tuples indicating that the j-th token of the i-th sentence has a label issue. Ordered by likelihood of mislabeling (most likely mislabeled first).

Inputs (get_label_quality_scores)

Name	Type	Required	Description
labels	list	Yes	Nested list of given labels for all tokens (same format as find_label_issues)
pred_probs	list	Yes	List of 1D numpy arrays of span probabilities (same format as find_label_issues)
**kwargs	dict	No	Additional keyword arguments passed to token_classification.rank.get_label_quality_scores

Outputs (get_label_quality_scores)

Name	Type	Description
sentence_scores	np.ndarray	Array of quality scores for each sentence
token_info	list	Per-token quality information

Internal Mechanism

The critical adapter function _get_pred_prob_token converts span probabilities to token classification format:

# For each sentence's span probabilities:
# pred_probs[i] = [0.9, 0.1, 0.8]  (probability of being in a span)
# Becomes:
# pred_probs_token[i] = [[0.1, 0.9], [0.9, 0.1], [0.2, 0.8]]
# (columns: [not-in-span, in-span])

This transformation is achieved by stacking [1 - probs, probs] along axis 1, converting a single probability into a two-class probability distribution suitable for the existing token classification pipeline.

Usage Examples

Basic Usage

import numpy as np
from cleanlab.experimental.span_classification import find_label_issues

# Labels: 0 = not in span, 1 = in span
labels = [[0, 0, 1, 1], [1, 1, 0]]

# Predicted probabilities of being in a span
pred_probs = [
    np.array([0.9, 0.9, 0.9, 0.1]),  # tokens 0,1 likely mislabeled (high prob but label=0)
    np.array([0.1, 0.1, 0.9]),         # tokens 0,1 likely mislabeled (low prob but label=1)
]

issues = find_label_issues(labels, pred_probs)
# Returns list of (sentence_index, token_index) tuples ranked by severity

Getting Label Quality Scores

import numpy as np
from cleanlab.experimental.span_classification import get_label_quality_scores

labels = [[0, 0, 1, 1], [1, 1, 0]]
pred_probs = [
    np.array([0.9, 0.9, 0.9, 0.1]),
    np.array([0.1, 0.1, 0.9]),
]

sentence_scores, token_info = get_label_quality_scores(labels, pred_probs)

Related Pages

Principle:Cleanlab_Cleanlab_Span_Classification_Issue_Detection

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment