Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Environment:Scikit learn Scikit learn OpenMP Thread Configuration

From Leeroopedia


Knowledge Sources
Domains Infrastructure, Parallelism
Last Updated 2026-02-08 15:00 GMT

Overview

OpenMP and BLAS thread configuration environment for controlling scikit-learn parallel computation via environment variables and threadpoolctl.

Description

Scikit-learn uses multiple levels of parallelism: OpenMP for Cython-level parallelism (pairwise distances, tree building), BLAS libraries (OpenBLAS, MKL, BLIS) for linear algebra operations, and joblib for Python-level multiprocessing. These layers can interfere with each other causing thread oversubscription. This environment documents the thread control variables and workarounds used by scikit-learn to manage parallel execution safely.

Usage

Use this environment configuration when running scikit-learn on multi-core systems, in Docker/containerized environments (where cgroups may limit visible CPUs), or when encountering performance issues from thread oversubscription. It is particularly relevant for the BaseForest_Fit (parallel tree building), Cross_Validate (parallel fold evaluation), and GridSearchCV (parallel parameter search) implementations.

System Requirements

Category Requirement Notes
OpenMP Runtime library (libgomp/libomp) Built into compiled Cython extensions
BLAS OpenBLAS, MKL, or BLIS One of these must be available via NumPy/SciPy
threadpoolctl >= 3.2.0 Used internally by sklearn to manage thread pools

Dependencies

System Packages

  • OpenMP runtime library (`libgomp` on Linux, `libomp` on macOS)
  • One BLAS library: OpenBLAS, Intel MKL, or BLIS

Python Packages

  • `threadpoolctl` >= 3.2.0
  • `joblib` >= 1.3.0

Credentials

The following environment variables control thread behavior (not secrets):

  • `OMP_NUM_THREADS`: Number of OpenMP threads (overrides automatic CPU detection)
  • `MKL_NUM_THREADS`: Number of threads for Intel MKL BLAS operations
  • `OPENBLAS_NUM_THREADS`: Number of threads for OpenBLAS operations
  • `BLIS_NUM_THREADS`: Number of threads for BLIS operations
  • `KMP_DUPLICATE_LIB_OK`: Allow multiple OpenMP libraries (macOS workaround, set to "True" by sklearn)
  • `KMP_INIT_AT_FORK`: Intel OpenMP fork workaround (set to "FALSE" by sklearn)

Quick Install

# threadpoolctl is installed automatically with scikit-learn
pip install scikit-learn

# To control threads at runtime:
export OMP_NUM_THREADS=4
export OPENBLAS_NUM_THREADS=4

# Or use threadpoolctl in Python:
# from threadpoolctl import threadpool_limits
# with threadpool_limits(limits=4):
#     model.fit(X, y)

Code Evidence

OpenMP environment workarounds from `sklearn/__init__.py:48-60`:

# On OSX, we can get a runtime error due to multiple OpenMP libraries loaded
# simultaneously. This can happen for instance when calling BLAS inside a
# prange. Setting the following environment variable allows multiple OpenMP
# libraries to be loaded.
os.environ.setdefault("KMP_DUPLICATE_LIB_OK", "True")

# Workaround issue discovered in intel-openmp 2019.5:
# https://github.com/ContinuumIO/anaconda-issues/issues/11294
os.environ.setdefault("KMP_INIT_AT_FORK", "FALSE")

Unstable OpenBLAS detection from `sklearn/utils/fixes.py:343-373`:

def _in_unstable_openblas_configuration():
    """Return True if in an unstable configuration for OpenBLAS"""
    modules_info = _get_threadpool_controller().info()
    open_blas_used = any(info["internal_api"] == "openblas" for info in modules_info)
    if not open_blas_used:
        return False
    # OpenBLAS 0.3.16 fixed instability for arm64
    openblas_arm64_stable_version = parse_version("0.3.16")
    for info in modules_info:
        if info["internal_api"] != "openblas":
            continue
        openblas_version = info.get("version")
        openblas_architecture = info.get("architecture")
        if openblas_version is None or openblas_architecture is None:
            return True
        if (
            openblas_architecture == "neoversen1"
            and parse_version(openblas_version) < openblas_arm64_stable_version
        ):
            return True
    return False

Config propagation warning from `sklearn/utils/parallel.py:29-37`:

warnings.warn(
    (
        "`sklearn.utils.parallel.Parallel` needs to be used in "
        "conjunction with `sklearn.utils.parallel.delayed` instead of "
        "`joblib.delayed` to correctly propagate the scikit-learn "
        "configuration to the joblib workers."
    ),
    UserWarning,
)

Common Errors

Error Message Cause Solution
`OMP: Error #15: Initializing libiomp5 ... but found libiomp5md already initialized` Multiple OpenMP runtimes loaded on macOS Set `KMP_DUPLICATE_LIB_OK=True` (done automatically by sklearn)
Performance degradation with `n_jobs > 1` Thread oversubscription: OpenMP + joblib both spawning threads Set `OMP_NUM_THREADS=1` when using `n_jobs > 1`
Hang or deadlock in parallel code Fork-safety issue with OpenMP Set `KMP_INIT_AT_FORK=FALSE` (done automatically by sklearn)
Numerical instability on ARM64 OpenBLAS < 0.3.16 on Neoverse N1 Upgrade OpenBLAS to >= 0.3.16 or use MKL

Compatibility Notes

  • macOS: Requires `KMP_DUPLICATE_LIB_OK=True` due to potential conflicts between system and Anaconda OpenMP libraries. Scikit-learn sets this automatically on import.
  • ARM64 (aarch64): OpenBLAS versions before 0.3.16 have known instabilities on Neoverse N1 architecture. Scikit-learn detects this and marks affected tests accordingly.
  • Docker/cgroups: When `OMP_NUM_THREADS` is not set, scikit-learn uses the minimum of `omp_get_max_threads()` and the CPU count (accounting for cgroup quotas).
  • pytest-xdist: When running tests in parallel with xdist, thread limits are automatically adjusted to `cpu_count // worker_count` to prevent oversubscription.

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment