Implementation:Cleanlab Cleanlab TC Display Issues

API	`token_classification.summary.display_issues` and `token_classification.summary.common_label_issues`
Source	`cleanlab/token_classification/summary.py:L13-22, L139-149`
Domains	Machine_Learning, Data_Quality, NLP
Last Updated	2026-02-09

Overview

Implementation of token-level issue visualization and error pattern summarization for token classification tasks. Provides two functions: display_issues for rendering highlighted sentences and common_label_issues for aggregating error patterns across the dataset.

Description

This module provides two complementary functions for reviewing token classification label issues:

display_issues: Prints sentences with problematic tokens highlighted in color. For each flagged token, it optionally shows the given label and the model's predicted label. The function displays the top N most problematic sentences and supports excluding specific (sentence, token) pairs from display.

common_label_issues: Aggregates all detected label issues into a frequency table showing how often each type of label error occurs (e.g., "B-PER mislabeled as O" appearing 47 times). Returns a DataFrame sorted by frequency for systematic pattern analysis.

Usage

These functions are used after detecting token-level label issues with find_label_issues. They are typically called in a Jupyter notebook environment for interactive review.

Code Reference

Source Location

cleanlab/token_classification/summary.py, lines 13-22 (display_issues) and lines 139-149 (common_label_issues).

Signature

def display_issues(
    issues: list,
    tokens: List[List[str]],
    *,
    labels: Optional[list] = None,
    pred_probs: Optional[list] = None,
    exclude: List[Tuple[int, int]] = [],
    class_names: Optional[List[str]] = None,
    top: int = 20,
) -> None

def common_label_issues(
    issues: List[Tuple[int, int]],
    tokens: List[List[str]],
    *,
    labels: Optional[list] = None,
    pred_probs: Optional[list] = None,
    class_names: Optional[List[str]] = None,
    top: int = 10,
    exclude: List[Tuple[int, int]] = [],
    verbose: bool = True,
) -> pd.DataFrame

Import

from cleanlab.token_classification.summary import display_issues, common_label_issues

I/O Contract

display_issues Inputs

Parameter	Type	Description
`issues`	`list`	List of (sentence_index, token_index) tuples identifying tokens with label issues, as returned by `find_label_issues`.
`tokens`	`List[List[str]]`	List of N lists, each containing the string tokens for the corresponding sentence.
`labels`	`Optional[list]`	List of N lists of integer class labels. When provided, the given label is shown for each flagged token.
`pred_probs`	`Optional[list]`	List of N numpy arrays of shape (T_i, K). When provided, the predicted label is shown for each flagged token.
`exclude`	`List[Tuple[int, int]]`	List of (sentence_index, token_index) tuples to exclude from display. Defaults to empty list.
`class_names`	`Optional[List[str]]`	List of human-readable class names. When provided, class names are shown instead of integer indices.
`top`	`int`	Maximum number of sentences to display. Defaults to 20.

display_issues Output

Type	Description
`None`	Prints highlighted sentences to standard output. Does not return a value.

common_label_issues Inputs

Parameter	Type	Description
`issues`	`List[Tuple[int, int]]`	List of (sentence_index, token_index) tuples identifying tokens with label issues.
`tokens`	`List[List[str]]`	List of N lists, each containing the string tokens for the corresponding sentence.
`labels`	`Optional[list]`	List of N lists of integer class labels.
`pred_probs`	`Optional[list]`	List of N numpy arrays of shape (T_i, K).
`class_names`	`Optional[List[str]]`	List of human-readable class names.
`top`	`int`	Maximum number of common issue patterns to return. Defaults to 10.
`exclude`	`List[Tuple[int, int]]`	List of (sentence_index, token_index) tuples to exclude from analysis.
`verbose`	`bool`	If True, prints the results in addition to returning them. Defaults to True.

common_label_issues Output

Type	Description
`pd.DataFrame`	DataFrame summarizing the most common label error patterns. Columns include the given label, predicted label, token examples, and frequency count. Sorted by frequency (most common first).

Usage Examples

import numpy as np
from cleanlab.token_classification.filter import find_label_issues
from cleanlab.token_classification.summary import display_issues, common_label_issues

# Labels and predictions for a NER dataset
labels = [
    [0, 1, 2, 0],
    [0, 0, 1, 0, 0],
    [1, 2, 0],
]

pred_probs = [
    np.array([
        [0.9, 0.05, 0.05],
        [0.1, 0.8, 0.1],
        [0.1, 0.1, 0.8],
        [0.85, 0.1, 0.05],
    ]),
    np.array([
        [0.95, 0.03, 0.02],
        [0.88, 0.07, 0.05],
        [0.3, 0.4, 0.3],
        [0.9, 0.05, 0.05],
        [0.92, 0.04, 0.04],
    ]),
    np.array([
        [0.15, 0.75, 0.1],
        [0.1, 0.2, 0.7],
        [0.8, 0.1, 0.1],
    ]),
]

tokens = [
    ["John", "lives", "in", "Paris"],
    ["The", "weather", "is", "nice", "today"],
    ["Alice", "Smith", "left"],
]

class_names = ["O", "B-PER", "I-PER"]

# Find issues
issues = find_label_issues(labels, pred_probs)

# Display highlighted sentences with flagged tokens
display_issues(
    issues,
    tokens,
    labels=labels,
    pred_probs=pred_probs,
    class_names=class_names,
    top=10,
)

# Get summary of common error patterns
common_issues_df = common_label_issues(
    issues,
    tokens,
    labels=labels,
    pred_probs=pred_probs,
    class_names=class_names,
    top=5,
)
# Returns DataFrame with columns like: given_label, predicted_label, count

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment