Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Principle:Cleanlab Cleanlab Issue Reporting

From Leeroopedia


Metadata
Sources Cleanlab, Cleanlab Docs
Domains Data_Quality, Dataset_Auditing, Reporting
Last Updated 2026-02-09 12:00 GMT

Overview

Mechanism for generating human-readable summaries of detected dataset quality issues, with configurable verbosity and detail levels.

Description

Issue reporting transforms the raw numerical results from automated issue detection into an interpretable text report. The report is printed to stdout and is designed for dataset curators to quickly understand what quality problems exist in their data.

The report includes:

  • Issue type counts: How many examples in the dataset are flagged for each type of issue.
  • Severity ranking: Issue types are sorted by severity so the most critical problems appear first.
  • Top-N problematic examples: For each issue type, the report displays the most severely affected examples with their scores.
  • Issue descriptions: Optional plain-language explanations of what each issue type means and how to interpret it.
  • Summary scores: Optional overall severity scores per issue type.

The level of detail shown is controlled by the verbosity parameter (0-4), allowing users to get a quick overview or a deep dive depending on their needs.

Usage

Use issue reporting after calling find_issues() to get a human-readable overview of all detected issues before diving into specific examples. This is typically the second step in a dataset audit workflow, providing an at-a-glance summary that guides subsequent investigation.

Theoretical Basis

Issue reporting applies standard report generation techniques to the audit results:

  1. Aggregation: Per-example issue flags are aggregated into counts and summaries per issue type, providing a dataset-level view of data quality.
  2. Severity sorting: Issue types are ranked by their overall severity scores (or number of flagged examples), so the most impactful problems are surfaced first.
  3. Top-N display: For each issue type, the top-N most problematic examples are shown with their severity scores. This gives curators concrete examples to inspect without overwhelming them with the full dataset.
  4. Verbosity levels: A configurable verbosity parameter controls the amount of detail shown. Lower verbosity shows only counts and top examples; higher verbosity adds descriptions, scores, and additional context.

Pseudocode

def report(data_issues, num_examples, verbosity, include_description):
    summary = data_issues.issue_summary
    summary = sort_by_severity(summary)

    for issue_type in summary:
        print(issue_type.name, issue_type.num_issues, "issues found")

        if include_description:
            print(issue_type.description)

        # Show top-N most problematic examples
        top_examples = get_top_issues(data_issues, issue_type, num_examples)
        print(top_examples)

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment