Implementation:Cleanlab Cleanlab TC Display Issues
| API | token_classification.summary.display_issues and token_classification.summary.common_label_issues
|
|---|---|
| Source | cleanlab/token_classification/summary.py:L13-22, L139-149
|
| Domains | Machine_Learning, Data_Quality, NLP |
| Last Updated | 2026-02-09 |
Overview
Implementation of token-level issue visualization and error pattern summarization for token classification tasks. Provides two functions: display_issues for rendering highlighted sentences and common_label_issues for aggregating error patterns across the dataset.
Description
This module provides two complementary functions for reviewing token classification label issues:
display_issues: Prints sentences with problematic tokens highlighted in color. For each flagged token, it optionally shows the given label and the model's predicted label. The function displays the top N most problematic sentences and supports excluding specific (sentence, token) pairs from display.
common_label_issues: Aggregates all detected label issues into a frequency table showing how often each type of label error occurs (e.g., "B-PER mislabeled as O" appearing 47 times). Returns a DataFrame sorted by frequency for systematic pattern analysis.
Usage
These functions are used after detecting token-level label issues with find_label_issues. They are typically called in a Jupyter notebook environment for interactive review.
Code Reference
Source Location
cleanlab/token_classification/summary.py, lines 13-22 (display_issues) and lines 139-149 (common_label_issues).
Signature
def display_issues(
issues: list,
tokens: List[List[str]],
*,
labels: Optional[list] = None,
pred_probs: Optional[list] = None,
exclude: List[Tuple[int, int]] = [],
class_names: Optional[List[str]] = None,
top: int = 20,
) -> None
def common_label_issues(
issues: List[Tuple[int, int]],
tokens: List[List[str]],
*,
labels: Optional[list] = None,
pred_probs: Optional[list] = None,
class_names: Optional[List[str]] = None,
top: int = 10,
exclude: List[Tuple[int, int]] = [],
verbose: bool = True,
) -> pd.DataFrame
Import
from cleanlab.token_classification.summary import display_issues, common_label_issues
I/O Contract
display_issues Inputs
| Parameter | Type | Description |
|---|---|---|
issues |
list |
List of (sentence_index, token_index) tuples identifying tokens with label issues, as returned by find_label_issues.
|
tokens |
List[List[str]] |
List of N lists, each containing the string tokens for the corresponding sentence. |
labels |
Optional[list] |
List of N lists of integer class labels. When provided, the given label is shown for each flagged token. |
pred_probs |
Optional[list] |
List of N numpy arrays of shape (T_i, K). When provided, the predicted label is shown for each flagged token. |
exclude |
List[Tuple[int, int]] |
List of (sentence_index, token_index) tuples to exclude from display. Defaults to empty list. |
class_names |
Optional[List[str]] |
List of human-readable class names. When provided, class names are shown instead of integer indices. |
top |
int |
Maximum number of sentences to display. Defaults to 20. |
display_issues Output
| Type | Description |
|---|---|
None |
Prints highlighted sentences to standard output. Does not return a value. |
common_label_issues Inputs
| Parameter | Type | Description |
|---|---|---|
issues |
List[Tuple[int, int]] |
List of (sentence_index, token_index) tuples identifying tokens with label issues. |
tokens |
List[List[str]] |
List of N lists, each containing the string tokens for the corresponding sentence. |
labels |
Optional[list] |
List of N lists of integer class labels. |
pred_probs |
Optional[list] |
List of N numpy arrays of shape (T_i, K). |
class_names |
Optional[List[str]] |
List of human-readable class names. |
top |
int |
Maximum number of common issue patterns to return. Defaults to 10. |
exclude |
List[Tuple[int, int]] |
List of (sentence_index, token_index) tuples to exclude from analysis. |
verbose |
bool |
If True, prints the results in addition to returning them. Defaults to True. |
common_label_issues Output
| Type | Description |
|---|---|
pd.DataFrame |
DataFrame summarizing the most common label error patterns. Columns include the given label, predicted label, token examples, and frequency count. Sorted by frequency (most common first). |
Usage Examples
import numpy as np
from cleanlab.token_classification.filter import find_label_issues
from cleanlab.token_classification.summary import display_issues, common_label_issues
# Labels and predictions for a NER dataset
labels = [
[0, 1, 2, 0],
[0, 0, 1, 0, 0],
[1, 2, 0],
]
pred_probs = [
np.array([
[0.9, 0.05, 0.05],
[0.1, 0.8, 0.1],
[0.1, 0.1, 0.8],
[0.85, 0.1, 0.05],
]),
np.array([
[0.95, 0.03, 0.02],
[0.88, 0.07, 0.05],
[0.3, 0.4, 0.3],
[0.9, 0.05, 0.05],
[0.92, 0.04, 0.04],
]),
np.array([
[0.15, 0.75, 0.1],
[0.1, 0.2, 0.7],
[0.8, 0.1, 0.1],
]),
]
tokens = [
["John", "lives", "in", "Paris"],
["The", "weather", "is", "nice", "today"],
["Alice", "Smith", "left"],
]
class_names = ["O", "B-PER", "I-PER"]
# Find issues
issues = find_label_issues(labels, pred_probs)
# Display highlighted sentences with flagged tokens
display_issues(
issues,
tokens,
labels=labels,
pred_probs=pred_probs,
class_names=class_names,
top=10,
)
# Get summary of common error patterns
common_issues_df = common_label_issues(
issues,
tokens,
labels=labels,
pred_probs=pred_probs,
class_names=class_names,
top=5,
)
# Returns DataFrame with columns like: given_label, predicted_label, count