Heuristic:Interpretml Interpret Memory Budget Heuristic

Knowledge Sources	InterpretML Interpret Internal benchmarking and memory profiling
Domains	Optimization, Machine_Learning
Last Updated	2026-02-07 12:00 GMT

Overview

Memory budgeting technique using a 2^20 (1,048,576) maximum cardinality limit per tensor dimension to cap memory at 1/16 GB per tensor cell layer, with detailed memory estimation formulas for EBM training and interaction detection.

Description

EBM training involves multiple memory-intensive data structures: score tensors, gradient/hessian arrays, and dataset copies across outer bags. The codebase uses a hard cardinality cap of 2^20 (1,048,576) bins per feature dimension. With 64 bytes per tensor cell, this limits each single-dimension tensor to 1/16 gigabyte. The code also pre-computes total memory requirements before training starts, accounting for boosting sample arrays, interaction detection arrays, tensor copies (current, best, and extracted), and shared memory dataset copies across processes.

Usage

Apply this heuristic when:

Training EBMs on datasets with high-cardinality categorical features
Encountering out-of-memory errors during EBM training or interaction detection
Estimating memory requirements before starting a training job
Choosing between `max_bins` and `max_interaction_bins` settings

The Insight (Rule of Thumb)

Action: The maximum cardinality per feature dimension is capped at 2^20 = 1,048,576.
Value: With 64 bytes/cell, one dimension = 64 MB maximum.
Trade-off: Higher cardinality gives finer discretization but quadratic memory growth for interaction terms.

Memory Estimation Formulas

Boosting sample bytes: `(3 if hessian else 2) * n_scores * n_samples * outer_bags * 4` (float32)
Interaction detection bytes: `(2 if hessian else 1) * n_scores * n_samples * outer_bags * 4` (float32)
Tensor bytes: `sum(bin_counts) * n_scores * outer_bags * 3 * 8` (float64, 3 copies)
Total boosting: sample_bytes + 3 * dataset_bytes + tensor_bytes
Total interaction: interaction_sample_bytes + 3 * pair_dataset_bytes

The factor of 3 for tensors accounts for: current update tensor + best update tensor (in C++) + extracted tensor (in Python).

Reasoning

The memory formula is carefully designed:

n_scores: 1 for binary/regression, n_classes for multiclass
outer_bags: Each bag needs its own copy of training/validation data
3x tensor multiplier: C++ maintains current and best tensors; Python extracts one more before teardown
Shared memory: When available, one shared copy is mapped into all processes; otherwise parent and children each hold copies

The 2^20 cardinality cap was chosen because:

A single interaction term with two max-cardinality features would require 2^40 cells = impractical
At 64 bytes/cell, 2^20 = 64 MB which is manageable per feature
Real-world features rarely need more than 1M unique bins for accurate discretization

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment