Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:Cleanlab Cleanlab Datalab Init

From Leeroopedia


Field Value
Sources Cleanlab
Domains Data_Quality, Dataset_Auditing
Last Updated 2026-02-09 12:00 GMT

Overview

Datalab_Init is the constructor of the Datalab class that initializes a dataset for comprehensive quality auditing by normalizing data formats, mapping labels, and configuring the audit task type.

Description

The Datalab.__init__ method accepts a dataset in any of several supported formats and prepares it for automated issue detection. It parses the task type string into an internal enum, wraps the raw data in a Data object that normalizes format and extracts labels, computes a data hash for caching, optionally initializes an image analysis lab, and constructs the DataIssues container that will hold all subsequent audit results.

Usage

Import Datalab from cleanlab and call the constructor with your dataset and label column name. This is always the first step in the Datalab audit workflow.

Code Reference

Source Location

Repository
cleanlab/cleanlab
File
cleanlab/datalab/datalab.py
Lines
100--107

Signature

class Datalab:
    def __init__(
        self,
        data: "DatasetLike",
        task: str = "classification",
        label_name: Optional[str] = None,
        image_key: Optional[str] = None,
        verbosity: int = 1,
    ) -> None

Import

from cleanlab import Datalab

I/O Contract

Inputs

Name Type Required Description
data DatasetLike (Dataset, DataFrame, dict, list, str path) Yes The dataset to audit. Accepts HuggingFace Dataset, pandas DataFrame, dict of arrays, list of dicts, or a string path to a local file (CSV, JSON, TXT) or HuggingFace Hub identifier.
task str No (default: "classification") The ML task type. Supported values: "classification", "regression", "multilabel".
label_name Optional[str] No The name of the label column in the dataset. Required if the dataset has labels.
image_key Optional[str] No Key pointing to the column containing PIL image objects. When specified, additional CleanVision image-specific issue types are checked. Only supported for HuggingFace Dataset format.
verbosity int No (default: 1) Controls how much information is printed during auditing. Valid values are 0 through 4.

Outputs

Name Type Description
return Datalab instance An initialized Datalab object containing the internal Data object, DataIssues container, task configuration, label map, and verbosity settings. Ready for find_issues() to be called.

Usage Examples

Basic Classification Dataset

import datasets
from cleanlab import Datalab

# Load a HuggingFace dataset
data = datasets.load_dataset("glue", "sst2", split="train")
datalab = Datalab(data, label_name="label")

From a pandas DataFrame

import pandas as pd
from cleanlab import Datalab

df = pd.DataFrame({
    "text": ["good movie", "bad film", "great acting", "terrible plot"],
    "label": ["positive", "negative", "positive", "negative"],
})
datalab = Datalab(data=df, label_name="label")

From a Dictionary

import numpy as np
from cleanlab import Datalab

X = np.array([[0, 1], [1, 1], [2, 2], [2, 0]])
y = np.array([0, 1, 1, 0])
datalab = Datalab(data={"X": X, "y": y}, label_name="y")

Regression Task

from cleanlab import Datalab

data = {"feature": [1.0, 2.0, 3.0, 4.0], "target": [1.1, 2.2, 2.9, 4.1]}
datalab = Datalab(data=data, task="regression", label_name="target")

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment