Heuristic:Dotnet Machinelearning FastTree Default Hyperparameters
| Knowledge Sources | |
|---|---|
| Domains | Optimization, Decision_Trees |
| Last Updated | 2026-02-09 11:00 GMT |
Overview
Default hyperparameter values for FastTree and GAM trainers, reflecting empirically tuned trade-offs between accuracy, speed, and model complexity.
Description
ML.NET's FastTree (gradient boosted decision trees) and GAM (Generalized Additive Models) trainers ship with carefully tuned default hyperparameters. FastTree uses a learning rate of 0.2, 100 trees, and 20 leaves per tree. GAM uses a much smaller learning rate of 0.002 with 9,500 iterations and 255 bins per feature. These defaults represent empirical trade-offs learned from extensive experimentation. Understanding these defaults helps users decide when and how to tune them.
Usage
Use this heuristic when configuring FastTree or GAM trainers and deciding whether to override default hyperparameters. The defaults work well for most medium-sized datasets but may need adjustment for very large or very small datasets, or when training speed vs. accuracy trade-offs need to be different.
The Insight (Rule of Thumb)
FastTree Defaults:
- Action: Use defaults unless dataset characteristics suggest otherwise.
- Values: `NumberOfTrees=100`, `NumberOfLeaves=20`, `MinimumExampleCountPerLeaf=10`, `LearningRate=0.2`
- Trade-off: Lower learning rate + more trees = better accuracy but slower training. Higher learning rate + fewer trees = faster but may overfit.
GAM Defaults:
- Action: GAM requires a much smaller learning rate than FastTree.
- Values: `NumberOfIterations=9500`, `MaximumBinCountPerFeature=255`, `LearningRate=0.002`
- Trade-off: GAM learning rate is 100x smaller than FastTree because additive models are more sensitive to step size. 255 bins matches byte representation for memory efficiency.
Ranking Sigmoid:
- Action: Use lookup table for sigmoid computation in ranking scenarios.
- Values: `SigmoidBins=1,000,000`, `ExpAsymptote=-50` (exp(x < -50) treated as 0)
- Trade-off: 1M bins provides high precision with O(1) lookup instead of expensive exp() calls. exp(-50) is approximately 2e-22, safely treated as zero.
Reasoning
FastTree's learning rate of 0.2 is a standard default for gradient boosting (comparable to XGBoost's default). The 20 leaves per tree limits model complexity while allowing sufficient interaction depth. The 10 minimum examples per leaf prevents overfitting to noise.
GAM's 0.002 learning rate is 100x smaller because GAMs build one-dimensional shape functions that are more sensitive to gradient steps. The 9,500 iterations compensate for the small step size. 255 bins per feature aligns with byte (uint8) representation, maximizing memory efficiency for histogram-based binning.
The sigmoid lookup table with 1M bins trades 8MB of memory for O(1) sigmoid evaluation instead of O(n) exp() calls during ranking gradient computation.
Code Evidence
FastTree defaults from `src/Microsoft.ML.FastTree/FastTreeArguments.cs:339-342`:
public const int NumberOfTrees = 100;
public const int NumberOfLeaves = 20;
public const int MinimumExampleCountPerLeaf = 10;
public const double LearningRate = 0.2;
GAM defaults from `src/Microsoft.ML.FastTree/GamTrainer.cs:716-718`:
internal const int NumberOfIterations = 9500;
internal const int MaximumBinCountPerFeature = 255;
internal const double LearningRate = 0.002; // A small value
Sigmoid lookup table from `src/Microsoft.ML.FastTree/FastTreeRanking.cs:531-532`:
private const double _expAsymptote = -50; // exp( x < expAsymptote ) is assumed to be 0
private const int _sigmoidBins = 1000000; // Number of bins in the lookup table