Implementation:Cleanlab Cleanlab Datalab Get Issue Summary
| Field | Value |
|---|---|
| Sources | Cleanlab |
| Domains | Data_Quality, Dataset_Auditing, Data_Aggregation |
| Last Updated | 2026-02-09 12:00 GMT |
Overview
Datalab_Get_Issue_Summary is the method that returns aggregate statistics about each type of detected issue across the entire dataset as a compact pandas DataFrame.
Description
The Datalab.get_issue_summary method retrieves the issue summary from the internal DataIssues container. The summary DataFrame contains one row per issue type that was checked during the audit, with columns for the issue type name, an overall severity score, and the count of flagged examples.
When called with a specific issue_name, only the row for that issue type is returned. When called with None, the full summary across all issue types is returned.
The overall severity score for each issue type is typically the mean of per-example scores across the dataset, but some issue types use alternative global statistics. For example, the non_iid issue type uses a p-value from a statistical hypothesis test. These summary scores are comparable across different datasets for the same issue type, but are not comparable across different issue types.
Usage
Call this method on a Datalab instance after find_issues() has completed. Use the returned summary to quickly assess which issue types are most prevalent and severe in your dataset.
Code Reference
Source Location
- Repository
cleanlab/cleanlab- File
cleanlab/datalab/datalab.py- Lines
- 516
Signature
def get_issue_summary(self, issue_name: Optional[str] = None) -> pd.DataFrame
Import
from cleanlab import Datalab
# get_issue_summary is a method of the Datalab instance
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
issue_name |
Optional[str] |
No (default: None) |
The name of the issue type to summarize. If None, returns the summary for all issue types that were previously checked in the audit.
|
Outputs
| Name | Type | Description |
|---|---|---|
| return | pd.DataFrame |
A DataFrame with columns: issue_type (str) naming the issue type, score (float) providing the overall severity score for this issue type (lower means more severe), and num_issues (int) counting how many examples in the dataset are flagged for this issue type. One row per issue type.
|
Usage Examples
Get Full Summary
from cleanlab import Datalab
lab = Datalab(data=my_data, label_name="label")
lab.find_issues(pred_probs=pred_probs, features=features)
# Get the full issue summary
summary = lab.get_issue_summary()
print(summary)
# issue_type score num_issues
# 0 label 0.95 12
# 1 outlier 0.87 5
# 2 duplicate 0.99 2
# 3 non_iid 0.72 0
Get Summary for a Specific Issue Type
# Get summary for label issues only
label_summary = lab.get_issue_summary("label")
print(label_summary)
# issue_type score num_issues
# 0 label 0.95 12
Identify the Most Prevalent Issue Type
# Find the issue type with the most flagged examples
summary = lab.get_issue_summary()
worst_issue = summary.sort_values("num_issues", ascending=False).iloc[0]
print(f"Most prevalent issue: {worst_issue['issue_type']} "
f"with {worst_issue['num_issues']} flagged examples")