Environment:Cleanlab Cleanlab Python Core Environment
| Knowledge Sources | |
|---|---|
| Domains | Data_Centric_AI, Machine_Learning |
| Last Updated | 2026-02-09 19:30 GMT |
Overview
Python 3.10+ environment with NumPy, scikit-learn, pandas, tqdm, and termcolor as core dependencies for the cleanlab data-centric AI library.
Description
This environment provides the base runtime for all cleanlab functionality. It is a pure CPU-based Python environment with no GPU requirements. The core dependencies handle numerical computation (NumPy), machine learning models (scikit-learn), data manipulation (pandas), progress display (tqdm), and colored terminal output (termcolor). Optional packages extend cleanlab with Datalab dataset auditing (requires the `datasets` package) and image quality checks (requires CleanVision).
Usage
Use this environment for any cleanlab workflow: classification label issue detection, dataset health analysis, CleanLearning robust training, object detection quality scoring, token classification, multiannotator consensus, and regression label quality. This is the mandatory prerequisite for all Implementation pages in the cleanlab wiki.
System Requirements
| Category | Requirement | Notes |
|---|---|---|
| OS | Linux, macOS, or Windows | All platforms supported; multiprocessing is fastest on Linux |
| Hardware | CPU | No GPU required for core functionality |
| Python | >= 3.10 | Supports 3.10, 3.11, 3.12, 3.13, 3.14 |
| Disk | Minimal | Depends on dataset size |
Dependencies
System Packages
No system-level packages required beyond a standard Python installation.
Python Packages (Core)
- `numpy` >= 1.22
- `scikit-learn` >= 1.1
- `tqdm` >= 4.53.0
- `pandas` >= 1.4.0
- `termcolor` >= 2.4.0
Python Packages (Optional)
- `psutil` — Enables detection of physical CPU cores for optimal multiprocessing (falls back to logical cores if absent)
- `matplotlib` >= 3.5.1 — Required for visualization functions in object detection, segmentation, and the `all` extras group
- `torch` — Required only for experimental PyTorch models (cifar_cnn, mnist_pytorch, coteaching)
- `scipy` — Used internally by neighbor search and outlier detection
Credentials
No credentials or environment variables are required for core cleanlab functionality.
Quick Install
# Core install
pip install cleanlab
# Install with all optional dependencies
pip install 'cleanlab[all]'
Code Evidence
Python version requirement from `pyproject.toml:47`:
requires-python = ">=3.10"
Core dependencies from `pyproject.toml:11-17`:
dependencies = [
"numpy>=1.22",
"scikit-learn>=1.1",
"tqdm>=4.53.0",
"pandas>=1.4.0",
"termcolor>=2.4.0",
]
Optional psutil import with fallback from `cleanlab/filter.py:43-50`:
# psutil is a package used to count physical cores for multiprocessing
# This package is not necessary, because we can always fall back to logical cores as the default
try:
import psutil
psutil_exists = True
except ImportError as e:
psutil_exists = False
Optional tqdm import with warning from `cleanlab/filter.py:33-41`:
try:
import tqdm.auto as tqdm
tqdm_exists = True
except ImportError as e:
tqdm_exists = False
w = """To see estimated completion times for methods in cleanlab.filter, "pip install tqdm"."""
warnings.warn(w)
Common Errors
| Error Message | Cause | Solution |
|---|---|---|
| `To default n_jobs to the number of physical cores... pip install psutil` | psutil not installed; falls back to logical cores | `pip install psutil` (optional, affects runtime only) |
| `To see estimated completion times... pip install tqdm` | tqdm not installed | `pip install tqdm` (optional, for progress bars) |
| `try "pip install matplotlib"` | matplotlib not installed but visualization function called | `pip install matplotlib` or `pip install 'cleanlab[all]'` |
Compatibility Notes
- Python 3.14+: Multiprocessing on Linux changes behavior. Global variable sharing via fork is no longer reliable; data is pickled to subprocesses instead (see `cleanlab/filter.py:363`).
- Windows/macOS: Multi-label multiprocessing defaults to `n_jobs=1` because spawn-based multiprocessing is much slower for these cases.
- sklearn >= 1.8.0: `confusion_matrix` with empty inputs raises `ValueError` instead of returning zeros matrix. Cleanlab handles this gracefully (see `cleanlab/count.py:600-604`).
Related Pages
- Implementation:Cleanlab_Cleanlab_Estimate_CV_Predicted_Probabilities
- Implementation:Cleanlab_Cleanlab_Compute_Confident_Joint
- Implementation:Cleanlab_Cleanlab_Estimate_Latent
- Implementation:Cleanlab_Cleanlab_Find_Label_Issues
- Implementation:Cleanlab_Cleanlab_Get_Label_Quality_Scores
- Implementation:Cleanlab_Cleanlab_Order_Label_Issues
- Implementation:Cleanlab_Cleanlab_Health_Summary
- Implementation:Cleanlab_Cleanlab_Rank_Classes_By_Label_Quality
- Implementation:Cleanlab_Cleanlab_Datalab_Init
- Implementation:Cleanlab_Cleanlab_Datalab_Find_Issues
- Implementation:Cleanlab_Cleanlab_Datalab_Report
- Implementation:Cleanlab_Cleanlab_Datalab_Get_Issues
- Implementation:Cleanlab_Cleanlab_Datalab_Get_Issue_Summary
- Implementation:Cleanlab_Cleanlab_CleanLearning_Init
- Implementation:Cleanlab_Cleanlab_CleanLearning_Find_Label_Issues
- Implementation:Cleanlab_Cleanlab_CleanLearning_Fit
- Implementation:Cleanlab_Cleanlab_CleanLearning_Predict
- Implementation:Cleanlab_Cleanlab_OD_Get_Label_Quality_Scores
- Implementation:Cleanlab_Cleanlab_OD_Find_Label_Issues
- Implementation:Cleanlab_Cleanlab_OD_Visualize
- Implementation:Cleanlab_Cleanlab_TC_Get_Label_Quality_Scores
- Implementation:Cleanlab_Cleanlab_TC_Find_Label_Issues
- Implementation:Cleanlab_Cleanlab_TC_Display_Issues
- Implementation:Cleanlab_Cleanlab_Get_Label_Quality_Multiannotator
- Implementation:Cleanlab_Cleanlab_Get_Active_Learning_Scores