Principle:Cleanlab Cleanlab Token Label Issue Filtering

Knowledge Sources	Cleanlab
Domains	Machine_Learning, Data_Quality, NLP
Last Updated	2026-02-09

Overview

Method for identifying specific tokens within sentences that have incorrect labels in sequence labeling tasks.

Description

Token label issue filtering adapts cleanlab's find_label_issues to token-level labels in sequence labeling tasks. Rather than identifying entire examples as mislabeled, it pinpoints the exact tokens whose labels are likely incorrect within their sentence context.

The method uses a flatten/unflatten pattern:

Flatten: Concatenate all token labels and predicted probabilities across all sentences into single, unified arrays as if they were independent classification examples.
Filter: Apply standard cleanlab label issue detection (confident learning) on the flattened arrays to identify which individual tokens have label issues.
Unflatten: Map the flagged token indices back to their original (sentence_index, token_index) positions using a cumulative length mapping.

This approach leverages cleanlab's well-established label issue detection methods for standard classification while adapting them to the structured, variable-length nature of sequence labeling data. The output is a list of (sentence, token) tuples that can be used directly for targeted review and correction.

Usage

Token label issue filtering is used when the goal is to identify the specific mislabeled tokens in a sequence labeling dataset. Typical applications include:

Targeted correction: Providing annotators with the exact tokens that need relabeling, rather than entire sentences.
Error analysis: Understanding which token positions or entity boundaries are most error-prone.
Input to visualization: Feeding identified issues into the display function for human-readable output.

Theoretical Basis

The flatten/unflatten pattern operates as follows:

Flatten Phase. Given N sentences with token counts T_1, T_2, ..., T_N, concatenate all token labels into a single array of length T_total = sum(T_i), and similarly for predicted probabilities:

flat_labels = concat(labels[0], labels[1], ..., labels[N-1])       # shape: (T_total,)
flat_pred_probs = concat(pred_probs[0], pred_probs[1], ..., pred_probs[N-1])  # shape: (T_total, K)

Filter Phase. Apply filter.find_label_issues(flat_labels, flat_pred_probs) to identify indices of mislabeled tokens in the flattened array.

Unflatten Phase. Map each flagged flat index back to its original (sentence, token) position using cumulative sentence lengths:

cumulative_lengths = [0, T_1, T_1+T_2, ..., T_total]
For each flagged flat_index:
    sentence_index = largest i such that cumulative_lengths[i] <= flat_index
    token_index = flat_index - cumulative_lengths[sentence_index]
    result.append((sentence_index, token_index))

Ranking. Results are ordered by the likelihood of being mislabeled (ascending quality score), so the most likely errors appear first.

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment