Heuristic:Scikit learn Scikit learn Working Memory Tuning

Knowledge Sources	scikit-learn Configuration
Domains	Optimization, Memory_Management
Last Updated	2026-02-08 15:00 GMT

Overview

Configuration technique for controlling temporary array sizes in chunked computations via `working_memory` to balance speed and memory usage.

Description

Scikit-learn uses a configurable `working_memory` parameter (default: 1024 MiB) to control the maximum size of temporary arrays allocated during chunked computations. Functions like pairwise distance calculations, scoring, and predictions split large datasets into chunks that fit within this memory budget. When a single row exceeds the working memory limit, the system falls back to processing one row at a time with a warning.

Usage

Apply this heuristic when encountering memory pressure during pairwise distance computations, large-scale scoring operations, or when running on memory-constrained environments (e.g., small VMs, shared servers). Relevant to Cross_Validate, BaseSearchCV_Fit, and any implementation using `sklearn.metrics.pairwise`.

The Insight (Rule of Thumb)

Action: Adjust `sklearn.set_config(working_memory=X)` or set `SKLEARN_WORKING_MEMORY=X` environment variable.
Value: Default is 1024 MiB. Reduce to 256-512 for memory-constrained systems; increase to 2048+ for large-memory servers.
Trade-off: Smaller working_memory reduces peak memory but increases overhead from more chunks; larger values reduce chunking overhead but increase peak memory.
Related setting: `pairwise_dist_chunk_size` (default: 256 rows) controls chunking for accelerated pairwise distances. Use powers of 2 for optimal cache behavior.

Reasoning

Many scikit-learn operations (e.g., computing pairwise distances for KNN, scoring predictions) create temporary matrices proportional to `n_samples * n_features`. For large datasets, these matrices can exceed available RAM. The working_memory setting enables automatic chunking: `chunk_n_rows = int(working_memory * (2^20) // row_bytes)`. This formula converts MiB to bytes and divides by the per-row memory cost to determine how many rows can be processed simultaneously. Processing more rows per chunk is faster (better cache utilization) but requires more memory.

Code Evidence

Global config initialization from `sklearn/_config.py:10-17`:

_global_config = {
    "assume_finite": bool(os.environ.get("SKLEARN_ASSUME_FINITE", False)),
    "working_memory": int(os.environ.get("SKLEARN_WORKING_MEMORY", 1024)),
    "print_changed_only": True,
    "display": "diagram",
    "pairwise_dist_chunk_size": int(
        os.environ.get("SKLEARN_PAIRWISE_DIST_CHUNK_SIZE", 256)
    ),
}

Chunk calculation with fallback from `sklearn/utils/_chunking.py:140-178`:

def get_chunk_n_rows(row_bytes, *, max_n_rows=None, working_memory=None):
    if working_memory is None:
        working_memory = get_config()["working_memory"]

    chunk_n_rows = int(working_memory * (2**20) // row_bytes)
    if max_n_rows is not None:
        chunk_n_rows = min(chunk_n_rows, max_n_rows)
    if chunk_n_rows < 1:
        warnings.warn(
            "Could not adhere to working_memory config. "
            "Currently %.0fMiB, %.0fMiB required."
            % (working_memory, np.ceil(row_bytes * 2**-20))
        )
        chunk_n_rows = 1
    return chunk_n_rows

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment