Principle:Cleanlab Cleanlab Token Label Issue Filtering
| Knowledge Sources | Cleanlab |
|---|---|
| Domains | Machine_Learning, Data_Quality, NLP |
| Last Updated | 2026-02-09 |
Overview
Method for identifying specific tokens within sentences that have incorrect labels in sequence labeling tasks.
Description
Token label issue filtering adapts cleanlab's find_label_issues to token-level labels in sequence labeling tasks. Rather than identifying entire examples as mislabeled, it pinpoints the exact tokens whose labels are likely incorrect within their sentence context.
The method uses a flatten/unflatten pattern:
- Flatten: Concatenate all token labels and predicted probabilities across all sentences into single, unified arrays as if they were independent classification examples.
- Filter: Apply standard cleanlab label issue detection (confident learning) on the flattened arrays to identify which individual tokens have label issues.
- Unflatten: Map the flagged token indices back to their original (sentence_index, token_index) positions using a cumulative length mapping.
This approach leverages cleanlab's well-established label issue detection methods for standard classification while adapting them to the structured, variable-length nature of sequence labeling data. The output is a list of (sentence, token) tuples that can be used directly for targeted review and correction.
Usage
Token label issue filtering is used when the goal is to identify the specific mislabeled tokens in a sequence labeling dataset. Typical applications include:
- Targeted correction: Providing annotators with the exact tokens that need relabeling, rather than entire sentences.
- Error analysis: Understanding which token positions or entity boundaries are most error-prone.
- Input to visualization: Feeding identified issues into the display function for human-readable output.
Theoretical Basis
The flatten/unflatten pattern operates as follows:
Flatten Phase. Given N sentences with token counts T_1, T_2, ..., T_N, concatenate all token labels into a single array of length T_total = sum(T_i), and similarly for predicted probabilities:
flat_labels = concat(labels[0], labels[1], ..., labels[N-1]) # shape: (T_total,)
flat_pred_probs = concat(pred_probs[0], pred_probs[1], ..., pred_probs[N-1]) # shape: (T_total, K)
Filter Phase. Apply filter.find_label_issues(flat_labels, flat_pred_probs) to identify indices of mislabeled tokens in the flattened array.
Unflatten Phase. Map each flagged flat index back to its original (sentence, token) position using cumulative sentence lengths:
cumulative_lengths = [0, T_1, T_1+T_2, ..., T_total]
For each flagged flat_index:
sentence_index = largest i such that cumulative_lengths[i] <= flat_index
token_index = flat_index - cumulative_lengths[sentence_index]
result.append((sentence_index, token_index))
Ranking. Results are ordered by the likelihood of being mislabeled (ascending quality score), so the most likely errors appear first.