Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:Cleanlab Cleanlab Datalab Get Issue Summary

From Leeroopedia


Field Value
Sources Cleanlab
Domains Data_Quality, Dataset_Auditing, Data_Aggregation
Last Updated 2026-02-09 12:00 GMT

Overview

Datalab_Get_Issue_Summary is the method that returns aggregate statistics about each type of detected issue across the entire dataset as a compact pandas DataFrame.

Description

The Datalab.get_issue_summary method retrieves the issue summary from the internal DataIssues container. The summary DataFrame contains one row per issue type that was checked during the audit, with columns for the issue type name, an overall severity score, and the count of flagged examples.

When called with a specific issue_name, only the row for that issue type is returned. When called with None, the full summary across all issue types is returned.

The overall severity score for each issue type is typically the mean of per-example scores across the dataset, but some issue types use alternative global statistics. For example, the non_iid issue type uses a p-value from a statistical hypothesis test. These summary scores are comparable across different datasets for the same issue type, but are not comparable across different issue types.

Usage

Call this method on a Datalab instance after find_issues() has completed. Use the returned summary to quickly assess which issue types are most prevalent and severe in your dataset.

Code Reference

Source Location

Repository
cleanlab/cleanlab
File
cleanlab/datalab/datalab.py
Lines
516

Signature

def get_issue_summary(self, issue_name: Optional[str] = None) -> pd.DataFrame

Import

from cleanlab import Datalab
# get_issue_summary is a method of the Datalab instance

I/O Contract

Inputs

Name Type Required Description
issue_name Optional[str] No (default: None) The name of the issue type to summarize. If None, returns the summary for all issue types that were previously checked in the audit.

Outputs

Name Type Description
return pd.DataFrame A DataFrame with columns: issue_type (str) naming the issue type, score (float) providing the overall severity score for this issue type (lower means more severe), and num_issues (int) counting how many examples in the dataset are flagged for this issue type. One row per issue type.

Usage Examples

Get Full Summary

from cleanlab import Datalab

lab = Datalab(data=my_data, label_name="label")
lab.find_issues(pred_probs=pred_probs, features=features)

# Get the full issue summary
summary = lab.get_issue_summary()
print(summary)
#     issue_type  score  num_issues
# 0        label  0.95          12
# 1      outlier  0.87           5
# 2    duplicate  0.99           2
# 3      non_iid  0.72           0

Get Summary for a Specific Issue Type

# Get summary for label issues only
label_summary = lab.get_issue_summary("label")
print(label_summary)
#   issue_type  score  num_issues
# 0      label  0.95          12

Identify the Most Prevalent Issue Type

# Find the issue type with the most flagged examples
summary = lab.get_issue_summary()
worst_issue = summary.sort_values("num_issues", ascending=False).iloc[0]
print(f"Most prevalent issue: {worst_issue['issue_type']} "
      f"with {worst_issue['num_issues']} flagged examples")

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment