Heuristic:Scikit learn Scikit learn Working Memory Tuning
| Knowledge Sources | |
|---|---|
| Domains | Optimization, Memory_Management |
| Last Updated | 2026-02-08 15:00 GMT |
Overview
Configuration technique for controlling temporary array sizes in chunked computations via `working_memory` to balance speed and memory usage.
Description
Scikit-learn uses a configurable `working_memory` parameter (default: 1024 MiB) to control the maximum size of temporary arrays allocated during chunked computations. Functions like pairwise distance calculations, scoring, and predictions split large datasets into chunks that fit within this memory budget. When a single row exceeds the working memory limit, the system falls back to processing one row at a time with a warning.
Usage
Apply this heuristic when encountering memory pressure during pairwise distance computations, large-scale scoring operations, or when running on memory-constrained environments (e.g., small VMs, shared servers). Relevant to Cross_Validate, BaseSearchCV_Fit, and any implementation using `sklearn.metrics.pairwise`.
The Insight (Rule of Thumb)
- Action: Adjust `sklearn.set_config(working_memory=X)` or set `SKLEARN_WORKING_MEMORY=X` environment variable.
- Value: Default is 1024 MiB. Reduce to 256-512 for memory-constrained systems; increase to 2048+ for large-memory servers.
- Trade-off: Smaller working_memory reduces peak memory but increases overhead from more chunks; larger values reduce chunking overhead but increase peak memory.
- Related setting: `pairwise_dist_chunk_size` (default: 256 rows) controls chunking for accelerated pairwise distances. Use powers of 2 for optimal cache behavior.
Reasoning
Many scikit-learn operations (e.g., computing pairwise distances for KNN, scoring predictions) create temporary matrices proportional to `n_samples * n_features`. For large datasets, these matrices can exceed available RAM. The working_memory setting enables automatic chunking: `chunk_n_rows = int(working_memory * (2^20) // row_bytes)`. This formula converts MiB to bytes and divides by the per-row memory cost to determine how many rows can be processed simultaneously. Processing more rows per chunk is faster (better cache utilization) but requires more memory.
Code Evidence
Global config initialization from `sklearn/_config.py:10-17`:
_global_config = {
"assume_finite": bool(os.environ.get("SKLEARN_ASSUME_FINITE", False)),
"working_memory": int(os.environ.get("SKLEARN_WORKING_MEMORY", 1024)),
"print_changed_only": True,
"display": "diagram",
"pairwise_dist_chunk_size": int(
os.environ.get("SKLEARN_PAIRWISE_DIST_CHUNK_SIZE", 256)
),
}
Chunk calculation with fallback from `sklearn/utils/_chunking.py:140-178`:
def get_chunk_n_rows(row_bytes, *, max_n_rows=None, working_memory=None):
if working_memory is None:
working_memory = get_config()["working_memory"]
chunk_n_rows = int(working_memory * (2**20) // row_bytes)
if max_n_rows is not None:
chunk_n_rows = min(chunk_n_rows, max_n_rows)
if chunk_n_rows < 1:
warnings.warn(
"Could not adhere to working_memory config. "
"Currently %.0fMiB, %.0fMiB required."
% (working_memory, np.ceil(row_bytes * 2**-20))
)
chunk_n_rows = 1
return chunk_n_rows