Principle:Cleanlab Cleanlab Issue Summarization
| Metadata | |
|---|---|
| Sources | Cleanlab, Cleanlab Docs |
| Domains | Data_Quality, Dataset_Auditing, Data_Aggregation |
| Last Updated | 2026-02-09 12:00 GMT |
Overview
Method for obtaining aggregate statistics about each type of detected issue across the entire dataset.
Description
Issue summarization provides a high-level overview of the dataset's quality by aggregating per-example issue flags into per-issue-type counts and severity scores. It answers questions like "how many examples have label issues?" and "how severe is the outlier problem?" in a single compact DataFrame.
This is distinct from the detailed per-example results returned by get_issues(). While get_issues() returns one row per example, get_issue_summary() returns one row per issue type, providing:
issue_type(str): The name of the issue type (e.g., "label", "outlier", "duplicate").score(float): An overall severity score for this issue type across the dataset. Typically the mean of per-example scores, though some issue types use global statistics (e.g., a p-value for the non-IID test).num_issues(int): The count of examples in the dataset flagged as having this issue type.
The summary enables rapid triage: dataset curators can immediately see which issue types are most prevalent and most severe, then drill down into specific examples using get_issues().
Usage
Use issue summarization after calling find_issues() to get a quick overview of how many issues of each type were detected and their overall severity. This is typically used to decide which issue types warrant further investigation.
Theoretical Basis
Issue summarization applies standard aggregation principles to the per-example audit results:
- Count aggregation: For each issue type, count the number of examples where
is_{type}_issueisTrue. This gives thenum_issuesvalue. - Score aggregation: For each issue type, compute an overall severity score. This is typically the mean of per-example
{type}_scorevalues across all examples, providing a dataset-level quality metric. For certain issue types (e.g.,non_iid), the score is a global statistic such as a p-value rather than an average. - Compact presentation: Results are presented as a compact table with one row per issue type, enabling at-a-glance comparison of the relative prevalence and severity of different data quality problems.
- Score non-comparability: Summary scores are directly comparable across different datasets for the same issue type, but are not comparable across different issue types. Each issue type uses a fundamentally different scoring methodology.
Pseudocode
def get_issue_summary(data_issues, issue_name=None):
summary = data_issues.issue_summary
# DataFrame with columns: issue_type, score, num_issues
if issue_name is not None:
summary = summary[summary["issue_type"] == issue_name]
return summary