Implementation:Cleanlab Cleanlab Span Find Label Issues
| Knowledge Sources | |
|---|---|
| Domains | Natural Language Processing, Span Classification, Noisy Labels |
| Last Updated | 2026-02-09 00:00 GMT |
Overview
Identifies tokens with label issues in span classification datasets by adapting cleanlab's token classification pipeline to handle span-specific probability formats.
Description
The span_classification module provides an adapter layer that enables cleanlab's token classification label issue detection to work with span classification data. In span classification, each token in a sentence receives a binary label indicating whether it belongs to a span or not, and model predictions are single probabilities per token rather than multi-class probability vectors. The module's key function, find_label_issues, converts these single-probability predictions into the two-column format expected by the token classification module and then delegates to the underlying token_classification.filter.find_label_issues function. The module also provides get_label_quality_scores for ranking sentences by overall label quality and display_issues for visualization.
Usage
Import this module when working with span classification datasets (e.g., named entity recognition with a single entity type) where each token has a binary span/non-span label and you want to identify tokens that are likely mislabeled. Currently only supports a single span class.
Code Reference
Source Location
- Repository: Cleanlab
- File: cleanlab/experimental/span_classification.py
- Lines: 17-62 (find_label_issues), 89-98 (get_label_quality_scores), 101-106 (_get_pred_prob_token)
Signature
def find_label_issues(
labels: list,
pred_probs: list,
) -> list:
def get_label_quality_scores(
labels: list,
pred_probs: list,
**kwargs,
) -> Tuple[np.ndarray, list]:
def display_issues(
issues: list,
tokens: List[List[str]],
*,
labels: Optional[list] = None,
pred_probs: Optional[list] = None,
exclude: List[Tuple[int, int]] = [],
class_names: Optional[List[str]] = None,
top: int = 20,
) -> None:
Import
from cleanlab.experimental.span_classification import find_label_issues
from cleanlab.experimental.span_classification import get_label_quality_scores
from cleanlab.experimental.span_classification import display_issues
I/O Contract
Inputs (find_label_issues)
| Name | Type | Required | Description |
|---|---|---|---|
| labels | list | Yes | Nested list of given labels for all tokens across sentences. Each inner list contains integer labels (0 or 1) for each token in a sentence, where 1 indicates the token is part of a span. |
| pred_probs | list | Yes | List of 1D numpy arrays, where each array contains per-token probabilities of belonging to a span. Each array has shape (T,) for T tokens in that sentence, with values between 0 and 1. |
Outputs (find_label_issues)
| Name | Type | Description |
|---|---|---|
| issues | list of tuples | List of (i, j) tuples indicating that the j-th token of the i-th sentence has a label issue. Ordered by likelihood of mislabeling (most likely mislabeled first). |
Inputs (get_label_quality_scores)
| Name | Type | Required | Description |
|---|---|---|---|
| labels | list | Yes | Nested list of given labels for all tokens (same format as find_label_issues) |
| pred_probs | list | Yes | List of 1D numpy arrays of span probabilities (same format as find_label_issues) |
| **kwargs | dict | No | Additional keyword arguments passed to token_classification.rank.get_label_quality_scores |
Outputs (get_label_quality_scores)
| Name | Type | Description |
|---|---|---|
| sentence_scores | np.ndarray | Array of quality scores for each sentence |
| token_info | list | Per-token quality information |
Internal Mechanism
The critical adapter function _get_pred_prob_token converts span probabilities to token classification format:
# For each sentence's span probabilities:
# pred_probs[i] = [0.9, 0.1, 0.8] (probability of being in a span)
# Becomes:
# pred_probs_token[i] = [[0.1, 0.9], [0.9, 0.1], [0.2, 0.8]]
# (columns: [not-in-span, in-span])
This transformation is achieved by stacking [1 - probs, probs] along axis 1, converting a single probability into a two-class probability distribution suitable for the existing token classification pipeline.
Usage Examples
Basic Usage
import numpy as np
from cleanlab.experimental.span_classification import find_label_issues
# Labels: 0 = not in span, 1 = in span
labels = [[0, 0, 1, 1], [1, 1, 0]]
# Predicted probabilities of being in a span
pred_probs = [
np.array([0.9, 0.9, 0.9, 0.1]), # tokens 0,1 likely mislabeled (high prob but label=0)
np.array([0.1, 0.1, 0.9]), # tokens 0,1 likely mislabeled (low prob but label=1)
]
issues = find_label_issues(labels, pred_probs)
# Returns list of (sentence_index, token_index) tuples ranked by severity
Getting Label Quality Scores
import numpy as np
from cleanlab.experimental.span_classification import get_label_quality_scores
labels = [[0, 0, 1, 1], [1, 1, 0]]
pred_probs = [
np.array([0.9, 0.9, 0.9, 0.1]),
np.array([0.1, 0.1, 0.9]),
]
sentence_scores, token_info = get_label_quality_scores(labels, pred_probs)