Heuristic:Interpretml Interpret Memory Budget Heuristic
| Knowledge Sources | |
|---|---|
| Domains | Optimization, Machine_Learning |
| Last Updated | 2026-02-07 12:00 GMT |
Overview
Memory budgeting technique using a 2^20 (1,048,576) maximum cardinality limit per tensor dimension to cap memory at 1/16 GB per tensor cell layer, with detailed memory estimation formulas for EBM training and interaction detection.
Description
EBM training involves multiple memory-intensive data structures: score tensors, gradient/hessian arrays, and dataset copies across outer bags. The codebase uses a hard cardinality cap of 2^20 (1,048,576) bins per feature dimension. With 64 bytes per tensor cell, this limits each single-dimension tensor to 1/16 gigabyte. The code also pre-computes total memory requirements before training starts, accounting for boosting sample arrays, interaction detection arrays, tensor copies (current, best, and extracted), and shared memory dataset copies across processes.
Usage
Apply this heuristic when:
- Training EBMs on datasets with high-cardinality categorical features
- Encountering out-of-memory errors during EBM training or interaction detection
- Estimating memory requirements before starting a training job
- Choosing between `max_bins` and `max_interaction_bins` settings
The Insight (Rule of Thumb)
- Action: The maximum cardinality per feature dimension is capped at 2^20 = 1,048,576.
- Value: With 64 bytes/cell, one dimension = 64 MB maximum.
- Trade-off: Higher cardinality gives finer discretization but quadratic memory growth for interaction terms.
Memory Estimation Formulas
- Boosting sample bytes: `(3 if hessian else 2) * n_scores * n_samples * outer_bags * 4` (float32)
- Interaction detection bytes: `(2 if hessian else 1) * n_scores * n_samples * outer_bags * 4` (float32)
- Tensor bytes: `sum(bin_counts) * n_scores * outer_bags * 3 * 8` (float64, 3 copies)
- Total boosting: sample_bytes + 3 * dataset_bytes + tensor_bytes
- Total interaction: interaction_sample_bytes + 3 * pair_dataset_bytes
The factor of 3 for tensors accounts for: current update tensor + best update tensor (in C++) + extracted tensor (in Python).
Reasoning
The memory formula is carefully designed:
- n_scores: 1 for binary/regression, n_classes for multiclass
- outer_bags: Each bag needs its own copy of training/validation data
- 3x tensor multiplier: C++ maintains current and best tensors; Python extracts one more before teardown
- Shared memory: When available, one shared copy is mapped into all processes; otherwise parent and children each hold copies
The 2^20 cardinality cap was chosen because:
- A single interaction term with two max-cardinality features would require 2^40 cells = impractical
- At 64 bytes/cell, 2^20 = 64 MB which is manageable per feature
- Real-world features rarely need more than 1M unique bins for accurate discretization
Related Pages
- Implementation:Interpretml_Interpret_Boost
- Implementation:Interpretml_Interpret_Measure_Interactions
- Implementation:Interpretml_Interpret_Construct_Bins
- Principle:Interpretml_Interpret_Bagged_Gradient_Boosting
- Principle:Interpretml_Interpret_Feature_Binning_And_Discretization
- Principle:Interpretml_Interpret_Interaction_Detection