Environment:Scikit learn Scikit learn OpenMP Thread Configuration
| Knowledge Sources | |
|---|---|
| Domains | Infrastructure, Parallelism |
| Last Updated | 2026-02-08 15:00 GMT |
Overview
OpenMP and BLAS thread configuration environment for controlling scikit-learn parallel computation via environment variables and threadpoolctl.
Description
Scikit-learn uses multiple levels of parallelism: OpenMP for Cython-level parallelism (pairwise distances, tree building), BLAS libraries (OpenBLAS, MKL, BLIS) for linear algebra operations, and joblib for Python-level multiprocessing. These layers can interfere with each other causing thread oversubscription. This environment documents the thread control variables and workarounds used by scikit-learn to manage parallel execution safely.
Usage
Use this environment configuration when running scikit-learn on multi-core systems, in Docker/containerized environments (where cgroups may limit visible CPUs), or when encountering performance issues from thread oversubscription. It is particularly relevant for the BaseForest_Fit (parallel tree building), Cross_Validate (parallel fold evaluation), and GridSearchCV (parallel parameter search) implementations.
System Requirements
| Category | Requirement | Notes |
|---|---|---|
| OpenMP | Runtime library (libgomp/libomp) | Built into compiled Cython extensions |
| BLAS | OpenBLAS, MKL, or BLIS | One of these must be available via NumPy/SciPy |
| threadpoolctl | >= 3.2.0 | Used internally by sklearn to manage thread pools |
Dependencies
System Packages
- OpenMP runtime library (`libgomp` on Linux, `libomp` on macOS)
- One BLAS library: OpenBLAS, Intel MKL, or BLIS
Python Packages
- `threadpoolctl` >= 3.2.0
- `joblib` >= 1.3.0
Credentials
The following environment variables control thread behavior (not secrets):
- `OMP_NUM_THREADS`: Number of OpenMP threads (overrides automatic CPU detection)
- `MKL_NUM_THREADS`: Number of threads for Intel MKL BLAS operations
- `OPENBLAS_NUM_THREADS`: Number of threads for OpenBLAS operations
- `BLIS_NUM_THREADS`: Number of threads for BLIS operations
- `KMP_DUPLICATE_LIB_OK`: Allow multiple OpenMP libraries (macOS workaround, set to "True" by sklearn)
- `KMP_INIT_AT_FORK`: Intel OpenMP fork workaround (set to "FALSE" by sklearn)
Quick Install
# threadpoolctl is installed automatically with scikit-learn
pip install scikit-learn
# To control threads at runtime:
export OMP_NUM_THREADS=4
export OPENBLAS_NUM_THREADS=4
# Or use threadpoolctl in Python:
# from threadpoolctl import threadpool_limits
# with threadpool_limits(limits=4):
# model.fit(X, y)
Code Evidence
OpenMP environment workarounds from `sklearn/__init__.py:48-60`:
# On OSX, we can get a runtime error due to multiple OpenMP libraries loaded
# simultaneously. This can happen for instance when calling BLAS inside a
# prange. Setting the following environment variable allows multiple OpenMP
# libraries to be loaded.
os.environ.setdefault("KMP_DUPLICATE_LIB_OK", "True")
# Workaround issue discovered in intel-openmp 2019.5:
# https://github.com/ContinuumIO/anaconda-issues/issues/11294
os.environ.setdefault("KMP_INIT_AT_FORK", "FALSE")
Unstable OpenBLAS detection from `sklearn/utils/fixes.py:343-373`:
def _in_unstable_openblas_configuration():
"""Return True if in an unstable configuration for OpenBLAS"""
modules_info = _get_threadpool_controller().info()
open_blas_used = any(info["internal_api"] == "openblas" for info in modules_info)
if not open_blas_used:
return False
# OpenBLAS 0.3.16 fixed instability for arm64
openblas_arm64_stable_version = parse_version("0.3.16")
for info in modules_info:
if info["internal_api"] != "openblas":
continue
openblas_version = info.get("version")
openblas_architecture = info.get("architecture")
if openblas_version is None or openblas_architecture is None:
return True
if (
openblas_architecture == "neoversen1"
and parse_version(openblas_version) < openblas_arm64_stable_version
):
return True
return False
Config propagation warning from `sklearn/utils/parallel.py:29-37`:
warnings.warn(
(
"`sklearn.utils.parallel.Parallel` needs to be used in "
"conjunction with `sklearn.utils.parallel.delayed` instead of "
"`joblib.delayed` to correctly propagate the scikit-learn "
"configuration to the joblib workers."
),
UserWarning,
)
Common Errors
| Error Message | Cause | Solution |
|---|---|---|
| `OMP: Error #15: Initializing libiomp5 ... but found libiomp5md already initialized` | Multiple OpenMP runtimes loaded on macOS | Set `KMP_DUPLICATE_LIB_OK=True` (done automatically by sklearn) |
| Performance degradation with `n_jobs > 1` | Thread oversubscription: OpenMP + joblib both spawning threads | Set `OMP_NUM_THREADS=1` when using `n_jobs > 1` |
| Hang or deadlock in parallel code | Fork-safety issue with OpenMP | Set `KMP_INIT_AT_FORK=FALSE` (done automatically by sklearn) |
| Numerical instability on ARM64 | OpenBLAS < 0.3.16 on Neoverse N1 | Upgrade OpenBLAS to >= 0.3.16 or use MKL |
Compatibility Notes
- macOS: Requires `KMP_DUPLICATE_LIB_OK=True` due to potential conflicts between system and Anaconda OpenMP libraries. Scikit-learn sets this automatically on import.
- ARM64 (aarch64): OpenBLAS versions before 0.3.16 have known instabilities on Neoverse N1 architecture. Scikit-learn detects this and marks affected tests accordingly.
- Docker/cgroups: When `OMP_NUM_THREADS` is not set, scikit-learn uses the minimum of `omp_get_max_threads()` and the CPU count (accounting for cgroup quotas).
- pytest-xdist: When running tests in parallel with xdist, thread limits are automatically adjusted to `cpu_count // worker_count` to prevent oversubscription.