Implementation:Cleanlab Cleanlab Datalab Init

Field	Value
Sources	Cleanlab
Domains	Data_Quality, Dataset_Auditing
Last Updated	2026-02-09 12:00 GMT

Overview

Datalab_Init is the constructor of the Datalab class that initializes a dataset for comprehensive quality auditing by normalizing data formats, mapping labels, and configuring the audit task type.

Description

The Datalab.__init__ method accepts a dataset in any of several supported formats and prepares it for automated issue detection. It parses the task type string into an internal enum, wraps the raw data in a Data object that normalizes format and extracts labels, computes a data hash for caching, optionally initializes an image analysis lab, and constructs the DataIssues container that will hold all subsequent audit results.

Usage

Import Datalab from cleanlab and call the constructor with your dataset and label column name. This is always the first step in the Datalab audit workflow.

Code Reference

Source Location

Repository: cleanlab/cleanlab
File: cleanlab/datalab/datalab.py
Lines: 100--107

Signature

class Datalab:
    def __init__(
        self,
        data: "DatasetLike",
        task: str = "classification",
        label_name: Optional[str] = None,
        image_key: Optional[str] = None,
        verbosity: int = 1,
    ) -> None

Import

from cleanlab import Datalab

I/O Contract

Inputs

Name	Type	Required	Description
`data`	`DatasetLike` (Dataset, DataFrame, dict, list, str path)	Yes	The dataset to audit. Accepts HuggingFace Dataset, pandas DataFrame, dict of arrays, list of dicts, or a string path to a local file (CSV, JSON, TXT) or HuggingFace Hub identifier.
`task`	`str`	No (default: `"classification"`)	The ML task type. Supported values: `"classification"`, `"regression"`, `"multilabel"`.
`label_name`	`Optional[str]`	No	The name of the label column in the dataset. Required if the dataset has labels.
`image_key`	`Optional[str]`	No	Key pointing to the column containing PIL image objects. When specified, additional CleanVision image-specific issue types are checked. Only supported for HuggingFace Dataset format.
`verbosity`	`int`	No (default: `1`)	Controls how much information is printed during auditing. Valid values are 0 through 4.

Outputs

Name	Type	Description
return	`Datalab` instance	An initialized Datalab object containing the internal `Data` object, `DataIssues` container, task configuration, label map, and verbosity settings. Ready for `find_issues()` to be called.

Usage Examples

Basic Classification Dataset

import datasets
from cleanlab import Datalab

# Load a HuggingFace dataset
data = datasets.load_dataset("glue", "sst2", split="train")
datalab = Datalab(data, label_name="label")

From a pandas DataFrame

import pandas as pd
from cleanlab import Datalab

df = pd.DataFrame({
    "text": ["good movie", "bad film", "great acting", "terrible plot"],
    "label": ["positive", "negative", "positive", "negative"],
})
datalab = Datalab(data=df, label_name="label")

From a Dictionary

import numpy as np
from cleanlab import Datalab

X = np.array([[0, 1], [1, 1], [2, 2], [2, 0]])
y = np.array([0, 1, 1, 0])
datalab = Datalab(data={"X": X, "y": y}, label_name="y")

Regression Task

from cleanlab import Datalab

data = {"feature": [1.0, 2.0, 3.0, 4.0], "target": [1.1, 2.2, 2.9, 4.1]}
datalab = Datalab(data=data, task="regression", label_name="target")

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment