Heuristic:Dotnet Machinelearning Numerical Stability Guards
| Knowledge Sources | |
|---|---|
| Domains | Optimization, Numerical_Computing |
| Last Updated | 2026-02-09 11:00 GMT |
Overview
Numerical stability patterns including probability clamping away from 0/1, precomputed constants, and weight scaling guards to prevent overflow/underflow in ML training.
Description
ML.NET's trainers implement several numerical stability guards to prevent common floating-point issues. These include clamping predicted probabilities away from exact 0.0 and 1.0 (preventing log(0) in loss functions), precomputing constants like sqrt(2) to avoid repeated computation, bounding weight scales within safe IEEE 754 exponent ranges, and explicit zero-initialization of native memory blocks.
Usage
Use this heuristic when implementing custom trainers or loss functions, debugging NaN/Infinity errors during training, or porting ML.NET patterns to custom code. These guards prevent subtle numerical issues that only manifest on specific data distributions.
The Insight (Rule of Thumb)
Probability Clamping:
- Action: Clamp predicted probabilities to [epsilon, 1-epsilon].
- Value: `MaxToReturn = 1 - Epsilon`; never return exactly 0.0 or 1.0.
- Trade-off: Introduces negligible bias but prevents catastrophic log(0) = -Infinity in cross-entropy loss.
Precomputed Constants:
- Action: Precompute frequently used mathematical constants.
- Value: `sqrt2 = 1.41421356237` instead of calling `Math.Sqrt(2)` each time.
- Trade-off: Minor memory for constant storage; avoids repeated floating-point computation in hot paths.
Weight Scale Bounds:
- Action: Bound weight scaling within safe IEEE 754 range.
- Value: `maxWeightScale = 1 << 10` (1024), `minWeightScale = 1/1024`
- Trade-off: Prevents exponent overflow (exponent range 127 to -128, tolerates 10 bits cut off).
Memory Zero-Initialization:
- Action: Force zero-initialization of native memory blocks with `()` syntax in C++.
- Value: `new int32_t[size]()` instead of `new int32_t[size]`
- Trade-off: Small initialization cost but prevents undefined behavior from uninitialized memory.
Reasoning
Cross-entropy loss computes `log(p)` where `p` is the predicted probability. If `p` is exactly 0.0, `log(0)` produces `-Infinity`, which poisons all subsequent gradient computations. Similarly, `log(1-p)` when `p=1.0` causes the same issue. Clamping to [epsilon, 1-epsilon] keeps all values in a numerically safe range with negligible impact on model quality.
The weight scale bounds relate to IEEE 754 double precision, where the exponent can represent values from approximately 2^-1022 to 2^1023. By limiting weight scales to 2^10 to 2^-10, we ensure that products of weights and features stay well within representable range even after many multiplicative updates.
Code Evidence
Probability clamping from `src/Microsoft.ML.Data/Prediction/Calibrator.cs:1951`:
private const float MaxToReturn = 1 - Epsilon; // max predicted is 1 - min;
Precomputed sqrt(2) from `src/Microsoft.ML.StandardTrainers/Standard/ModelStatistics.cs:342`:
const double sqrt2 = 1.41421356237; // Math.Sqrt(2);
Weight scale bounds from `src/Microsoft.ML.StandardTrainers/Standard/Online/OnlineLinear.cs:251`:
private const float _maxWeightScale = 1 << 10;
// Exponent ranges 127 to -128, tolerate 10 being cut off that.
private const float _minWeightScale = 1 / _maxWeightScale;
IEEE 754 bit constants from `src/Microsoft.ML.Core/Utilities/FloatUtils.cs:25-30`:
public const int RawExpInf = 0x7FF; // Raw exponent for infinities and NaN
public const int RawExpZero = 0x3FF; // Raw exponent for "1" (logically zero)
public const int CbitExp = 11; // Number of exponent bits
public const int CbitMan = 52; // Number of mantissa bits
public const int ExpDenorm = -1074; // Minimum exponent for denormalized numbers
Zero-initialization from `src/Native/LdaNative/model_block.cpp:115-116`:
mem_block_ = new int32_t[mem_block_size_](); // NOTE: force initialize to zero
alias_mem_block_ = new int32_t[alias_mem_block_size_](); // NOTE: force initialize to zero