Implementation:Cleanlab Cleanlab Datalab Init
| Field | Value |
|---|---|
| Sources | Cleanlab |
| Domains | Data_Quality, Dataset_Auditing |
| Last Updated | 2026-02-09 12:00 GMT |
Overview
Datalab_Init is the constructor of the Datalab class that initializes a dataset for comprehensive quality auditing by normalizing data formats, mapping labels, and configuring the audit task type.
Description
The Datalab.__init__ method accepts a dataset in any of several supported formats and prepares it for automated issue detection. It parses the task type string into an internal enum, wraps the raw data in a Data object that normalizes format and extracts labels, computes a data hash for caching, optionally initializes an image analysis lab, and constructs the DataIssues container that will hold all subsequent audit results.
Usage
Import Datalab from cleanlab and call the constructor with your dataset and label column name. This is always the first step in the Datalab audit workflow.
Code Reference
Source Location
- Repository
cleanlab/cleanlab- File
cleanlab/datalab/datalab.py- Lines
- 100--107
Signature
class Datalab:
def __init__(
self,
data: "DatasetLike",
task: str = "classification",
label_name: Optional[str] = None,
image_key: Optional[str] = None,
verbosity: int = 1,
) -> None
Import
from cleanlab import Datalab
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
data |
DatasetLike (Dataset, DataFrame, dict, list, str path) |
Yes | The dataset to audit. Accepts HuggingFace Dataset, pandas DataFrame, dict of arrays, list of dicts, or a string path to a local file (CSV, JSON, TXT) or HuggingFace Hub identifier. |
task |
str |
No (default: "classification") |
The ML task type. Supported values: "classification", "regression", "multilabel".
|
label_name |
Optional[str] |
No | The name of the label column in the dataset. Required if the dataset has labels. |
image_key |
Optional[str] |
No | Key pointing to the column containing PIL image objects. When specified, additional CleanVision image-specific issue types are checked. Only supported for HuggingFace Dataset format. |
verbosity |
int |
No (default: 1) |
Controls how much information is printed during auditing. Valid values are 0 through 4. |
Outputs
| Name | Type | Description |
|---|---|---|
| return | Datalab instance |
An initialized Datalab object containing the internal Data object, DataIssues container, task configuration, label map, and verbosity settings. Ready for find_issues() to be called.
|
Usage Examples
Basic Classification Dataset
import datasets
from cleanlab import Datalab
# Load a HuggingFace dataset
data = datasets.load_dataset("glue", "sst2", split="train")
datalab = Datalab(data, label_name="label")
From a pandas DataFrame
import pandas as pd
from cleanlab import Datalab
df = pd.DataFrame({
"text": ["good movie", "bad film", "great acting", "terrible plot"],
"label": ["positive", "negative", "positive", "negative"],
})
datalab = Datalab(data=df, label_name="label")
From a Dictionary
import numpy as np
from cleanlab import Datalab
X = np.array([[0, 1], [1, 1], [2, 2], [2, 0]])
y = np.array([0, 1, 1, 0])
datalab = Datalab(data={"X": X, "y": y}, label_name="y")
Regression Task
from cleanlab import Datalab
data = {"feature": [1.0, 2.0, 3.0, 4.0], "target": [1.1, 2.2, 2.9, 4.1]}
datalab = Datalab(data=data, task="regression", label_name="target")