Heuristic:Dotnet Machinelearning Sparsity Threshold Optimization
| Knowledge Sources | |
|---|---|
| Domains | Optimization, Memory_Management |
| Last Updated | 2026-02-09 11:00 GMT |
Overview
Use sparse representation when fewer than 30% of feature values are non-zero, reducing memory usage significantly in high-dimensional datasets.
Description
FastTree's feature binning uses a sparsity threshold of 0.7 (70%) to decide between dense and sparse array representations. When more than 70% of values in a feature column are the zero bin, the sparse representation is used. This heuristic also applies to the ensemble compression module which caps memory at 4GB and uses a 100:1 observations-to-features ratio for stable regularization paths.
Usage
Use this heuristic when working with high-dimensional sparse data (text features, one-hot encoded categoricals) or when encountering memory pressure during FastTree training. The 30% non-zero threshold helps decide whether to apply sparsification to custom feature representations.
The Insight (Rule of Thumb)
Sparsity Threshold:
- Action: Switch to sparse representation when <30% of values are non-zero.
- Value: `sparsifyThreshold = 0.7` (70% zeros triggers sparse mode)
- Trade-off: Sparse representation reduces memory by up to 10x for very sparse data, but adds overhead for data access patterns. Dense is faster for values with >30% non-zero entries.
Memory Cap:
- Action: Cap memory allocation at 4GB for ensemble compression (LASSO).
- Value: `MaxAvailableMemory = 4GB`
- Trade-off: Prevents OOM crashes at the cost of potentially suboptimal compression for very large ensembles.
Observations-to-Features Ratio:
- Action: Maintain at least 100:1 ratio of observations to features.
- Value: `MaxObservationsTOFeaturesRatio = 100`
- Trade-off: Below this ratio, regularization path becomes unstable. Above it, computation cost increases linearly.
Reasoning
Sparse representation stores only non-zero values and their indices. For a feature with 5% non-zero values, sparse uses roughly 10% of the memory compared to dense. The 70% threshold was chosen empirically to balance memory savings against the per-element overhead of sparse storage (index + value vs. value only).
The 4GB memory cap for LASSO ensemble compression prevents system OOM on large models while allowing sufficient working memory for the coordinate descent optimizer. The convergence threshold of 1e-4 balances precision against iteration count.
Code Evidence
Sparsity threshold from `src/Microsoft.ML.FastTree/FastTree.cs:1210`:
const double sparsifyThreshold = 0.7; // 70% sparsity threshold
if (!values.IsDense && zeroBin == 0 &&
valuesValues.Length < (1 - sparsifyThreshold) * values.Length)
{
// Use sparse representation
}
Memory cap from `src/Microsoft.ML.FastTree/Training/EnsembleCompression/LassoBasedEnsembleCompressor.cs:21-29`:
private const long MaxAvailableMemory = 4L * 1024 * 1024 * 1024; // 4GB cap
private const int MaxObservationsTOFeaturesRatio = 100; // 100:1 ratio
private const double Epsilon = 1.0e-6; // Numerical tolerance
private const int DefaultNumberOFLambdas = 100; // Lambda values
private const double ConvergenceThreshold = 1.0e-4; // Coordinate descent convergence
private const double MaxRSquared = 0.999; // R-squared saturation point