Heuristic:Dotnet Machinelearning Sparsity Threshold Optimization

Knowledge Sources	dotnet/machinelearning ML.NET Team
Domains	Optimization, Memory_Management
Last Updated	2026-02-09 11:00 GMT

Overview

Use sparse representation when fewer than 30% of feature values are non-zero, reducing memory usage significantly in high-dimensional datasets.

Description

FastTree's feature binning uses a sparsity threshold of 0.7 (70%) to decide between dense and sparse array representations. When more than 70% of values in a feature column are the zero bin, the sparse representation is used. This heuristic also applies to the ensemble compression module which caps memory at 4GB and uses a 100:1 observations-to-features ratio for stable regularization paths.

Usage

Use this heuristic when working with high-dimensional sparse data (text features, one-hot encoded categoricals) or when encountering memory pressure during FastTree training. The 30% non-zero threshold helps decide whether to apply sparsification to custom feature representations.

The Insight (Rule of Thumb)

Sparsity Threshold:

Action: Switch to sparse representation when <30% of values are non-zero.
Value: `sparsifyThreshold = 0.7` (70% zeros triggers sparse mode)
Trade-off: Sparse representation reduces memory by up to 10x for very sparse data, but adds overhead for data access patterns. Dense is faster for values with >30% non-zero entries.

Memory Cap:

Action: Cap memory allocation at 4GB for ensemble compression (LASSO).
Value: `MaxAvailableMemory = 4GB`
Trade-off: Prevents OOM crashes at the cost of potentially suboptimal compression for very large ensembles.

Observations-to-Features Ratio:

Action: Maintain at least 100:1 ratio of observations to features.
Value: `MaxObservationsTOFeaturesRatio = 100`
Trade-off: Below this ratio, regularization path becomes unstable. Above it, computation cost increases linearly.

Reasoning

Sparse representation stores only non-zero values and their indices. For a feature with 5% non-zero values, sparse uses roughly 10% of the memory compared to dense. The 70% threshold was chosen empirically to balance memory savings against the per-element overhead of sparse storage (index + value vs. value only).

The 4GB memory cap for LASSO ensemble compression prevents system OOM on large models while allowing sufficient working memory for the coordinate descent optimizer. The convergence threshold of 1e-4 balances precision against iteration count.

Code Evidence

Sparsity threshold from `src/Microsoft.ML.FastTree/FastTree.cs:1210`:

const double sparsifyThreshold = 0.7;  // 70% sparsity threshold
if (!values.IsDense && zeroBin == 0 &&
    valuesValues.Length < (1 - sparsifyThreshold) * values.Length)
{
    // Use sparse representation
}

Memory cap from `src/Microsoft.ML.FastTree/Training/EnsembleCompression/LassoBasedEnsembleCompressor.cs:21-29`:

private const long MaxAvailableMemory = 4L * 1024 * 1024 * 1024;        // 4GB cap
private const int MaxObservationsTOFeaturesRatio = 100;                 // 100:1 ratio
private const double Epsilon = 1.0e-6;                                  // Numerical tolerance
private const int DefaultNumberOFLambdas = 100;                         // Lambda values
private const double ConvergenceThreshold = 1.0e-4;                     // Coordinate descent convergence
private const double MaxRSquared = 0.999;                               // R-squared saturation point

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment