Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Heuristic:Interpretml Interpret Memory Budget Heuristic

From Leeroopedia




Knowledge Sources
Domains Optimization, Machine_Learning
Last Updated 2026-02-07 12:00 GMT

Overview

Memory budgeting technique using a 2^20 (1,048,576) maximum cardinality limit per tensor dimension to cap memory at 1/16 GB per tensor cell layer, with detailed memory estimation formulas for EBM training and interaction detection.

Description

EBM training involves multiple memory-intensive data structures: score tensors, gradient/hessian arrays, and dataset copies across outer bags. The codebase uses a hard cardinality cap of 2^20 (1,048,576) bins per feature dimension. With 64 bytes per tensor cell, this limits each single-dimension tensor to 1/16 gigabyte. The code also pre-computes total memory requirements before training starts, accounting for boosting sample arrays, interaction detection arrays, tensor copies (current, best, and extracted), and shared memory dataset copies across processes.

Usage

Apply this heuristic when:

  • Training EBMs on datasets with high-cardinality categorical features
  • Encountering out-of-memory errors during EBM training or interaction detection
  • Estimating memory requirements before starting a training job
  • Choosing between `max_bins` and `max_interaction_bins` settings

The Insight (Rule of Thumb)

  • Action: The maximum cardinality per feature dimension is capped at 2^20 = 1,048,576.
  • Value: With 64 bytes/cell, one dimension = 64 MB maximum.
  • Trade-off: Higher cardinality gives finer discretization but quadratic memory growth for interaction terms.

Memory Estimation Formulas

  • Boosting sample bytes: `(3 if hessian else 2) * n_scores * n_samples * outer_bags * 4` (float32)
  • Interaction detection bytes: `(2 if hessian else 1) * n_scores * n_samples * outer_bags * 4` (float32)
  • Tensor bytes: `sum(bin_counts) * n_scores * outer_bags * 3 * 8` (float64, 3 copies)
  • Total boosting: sample_bytes + 3 * dataset_bytes + tensor_bytes
  • Total interaction: interaction_sample_bytes + 3 * pair_dataset_bytes

The factor of 3 for tensors accounts for: current update tensor + best update tensor (in C++) + extracted tensor (in Python).

Reasoning

The memory formula is carefully designed:

  • n_scores: 1 for binary/regression, n_classes for multiclass
  • outer_bags: Each bag needs its own copy of training/validation data
  • 3x tensor multiplier: C++ maintains current and best tensors; Python extracts one more before teardown
  • Shared memory: When available, one shared copy is mapped into all processes; otherwise parent and children each hold copies

The 2^20 cardinality cap was chosen because:

  • A single interaction term with two max-cardinality features would require 2^40 cells = impractical
  • At 64 bytes/cell, 2^20 = 64 MB which is manageable per feature
  • Real-world features rarely need more than 1M unique bins for accurate discretization

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment