Environment:Cleanlab Cleanlab Datalab Dependencies
| Knowledge Sources | |
|---|---|
| Domains | Data_Centric_AI, Dataset_Auditing |
| Last Updated | 2026-02-09 19:30 GMT |
Overview
Optional dependency environment extending the core cleanlab install with the HuggingFace `datasets` package, required for Datalab automated dataset auditing.
Description
The Datalab module provides automated multi-issue detection across a dataset (label errors, outliers, duplicates, class imbalance, non-IID violations, null values, underperforming groups, data valuation). It requires the HuggingFace `datasets` library (>= 2.7.0) for its internal data storage and type system. When the `datasets` package is not installed, Datalab is replaced with a `DatalabUnavailable` stub that raises `ImportError` with installation instructions on any access attempt.
Usage
Use this environment when running the Datalab automated dataset audit workflow. Install via the `[datalab]` extras group. This is the prerequisite for the Datalab_Init, Datalab_Find_Issues, Datalab_Report, Datalab_Get_Issues, and Datalab_Get_Issue_Summary implementations.
System Requirements
| Category | Requirement | Notes |
|---|---|---|
| OS | Linux, macOS, or Windows | Same as core cleanlab |
| Hardware | CPU | No GPU required |
| Python | >= 3.10 | Same as core cleanlab |
| Disk | Moderate | HuggingFace datasets may cache data to disk |
Dependencies
System Packages
No additional system-level packages required beyond core cleanlab.
Python Packages
- All core cleanlab dependencies (see Environment:Cleanlab_Cleanlab_Python_Core_Environment)
- `datasets` >= 2.7.0
Credentials
No credentials required. HuggingFace datasets can load local or public datasets without authentication.
Quick Install
pip install 'cleanlab[datalab]'
Code Evidence
Datalab import gating from `cleanlab/__init__.py:32-41`:
def _datalab_import_factory():
try:
from .datalab.datalab import Datalab as _Datalab
return _Datalab
except ImportError:
return DatalabUnavailable(
"Datalab is not available due to missing dependencies. "
"To install Datalab, run `pip install 'cleanlab[datalab]'`."
)
Datasets package import check from `cleanlab/datalab/internal/data.py:8-15`:
try:
import datasets
except ImportError as error:
raise ImportError(
"Cannot import datasets package. "
"Please install it and try again, or just install cleanlab with "
"all optional dependencies via: `pip install 'cleanlab[all]'`"
) from error
Datasets 4.0.0+ compatibility handling from `cleanlab/datalab/internal/data.py:22-29`:
# Import Column types for compatibility with datasets 4.0.0+
try:
from datasets.arrow_dataset import Column
from datasets.iterable_dataset import IterableColumn
except ImportError:
# For backwards compatibility with older datasets versions
Column = None
IterableColumn = None
Optional dependency definition from `setup.py:24-28`:
DATALAB_REQUIRE = [
# Mainly for Datalab's data storage class.
# Still some type hints that require datasets
"datasets>=2.7.0",
]
Common Errors
| Error Message | Cause | Solution |
|---|---|---|
| `Datalab is not available due to missing dependencies` | `datasets` package not installed | `pip install 'cleanlab[datalab]'` |
| `IssueManager is not available due to missing dependencies for Datalab` | `datasets` package not installed | `pip install 'cleanlab[datalab]'` |
| `Cannot import datasets package` | Attempting to use Datalab internals without `datasets` | `pip install 'cleanlab[all]'` |
Compatibility Notes
- datasets >= 4.0.0: Introduced new `Column` and `IterableColumn` types. Cleanlab handles both old and new versions gracefully via try/except import.
- datasets < 2.7.0: Not supported. May cause subtle type or API errors.