Implementation:Dotnet Machinelearning FastTree Sumup
| Knowledge Sources | |
|---|---|
| Domains | Machine Learning, Gradient Boosting, Native Interop |
| Last Updated | 2026-02-09 12:00 GMT |
Overview
Template-based histogram accumulation routines for gradient boosted tree feature binning, supporting multiple bit-widths, sparsity modes, and weighting schemes in the FastTree native library.
Description
The FastTree Sumup module implements the core histogram construction step of the gradient boosted decision tree training loop. During tree building, each candidate split point must be evaluated by accumulating gradient and Hessian statistics into discrete bins corresponding to quantized feature values. This module provides highly optimized C++ template functions that handle the combinatorial explosion of configurations: feature values encoded at 1, 4, 8, 16, or 32 bits; dense or delta-sparse storage layouts; indexed (subset) or non-indexed (full scan) document iteration; and optional sample weighting.
Sumup.h (213 lines) contains the primary templates:
- Sumup() -- Accumulates gradient statistics into histogram bins for dense feature representations with an explicit index array mapping documents to their positions in the current leaf.
- Sumup_noindices() -- Optimized variant for the case where all documents in the leaf are contiguous, eliminating the indirection through a permutation array.
- SumupDeltaSparse() -- Handles features stored in delta-sparse encoding, where only non-default bin values are stored alongside their row deltas. This is critical for high-dimensional sparse datasets.
- SumupDeltaSparse_noindices() -- Combined delta-sparse and contiguous-document optimization.
SumupOneBit.h (104 lines) provides specialized implementations for binary (1-bit) features:
- SumupOneBit() -- Processes features that take only two values (0 or 1), using bit-packed storage and bitwise operations for maximum throughput. The entire feature column is stored as a bitfield, and accumulation proceeds by testing individual bits.
- SumupOneBit_noindices() -- Non-indexed variant for binary features.
expand.h (102 lines) provides the C-linkage export layer:
- Defines the exported functions C_Sumup_float, C_Sumup_double, C_SumupDeltaSparse_float, C_SumupDeltaSparse_double, C_SumupSegment_float, C_SumupSegment_double.
- Each export dispatches at runtime on the numBits parameter (1, 4, 8, 16, or 32), selecting the appropriately instantiated template.
- The segment variants accumulate statistics for segment-encoded features where bins are described by variable-length encoded segments rather than per-row values.
Usage
These routines are invoked from the managed C# training loop via P/Invoke during the tree-growing phase of FastTree, LightGBM-style, and TLC gradient boosting implementations in ML.NET. They are called once per feature per leaf per tree level, making them the innermost hot loop of training. The runtime dispatch on numBits allows the managed layer to choose the most compact representation for each feature based on its cardinality.
Code Reference
Source Location
- Repository: Dotnet_Machinelearning
- File: src/Native/FastTreeNative/Sumup.h
- Lines: 1-213
- File: src/Native/FastTreeNative/SumupOneBit.h
- Lines: 1-104
- File: src/Native/FastTreeNative/expand.h
- Lines: 1-102
Signature
// Sumup.h - Dense histogram accumulation with index array
template <typename FloatT, int numBits>
void Sumup(
int numBins,
unsigned char *pData,
FloatT *pSampleOutputs,
double *pSampleOutputWeights,
int numDocs,
int *pIndices,
double *pSumTargetsByBin,
double *pSumWeightsByBin,
int *pCountByBin,
int totalCount
);
// Sumup.h - Dense histogram accumulation without index array
template <typename FloatT, int numBits>
void Sumup_noindices(
int numBins,
unsigned char *pData,
FloatT *pSampleOutputs,
double *pSampleOutputWeights,
int numDocs,
double *pSumTargetsByBin,
double *pSumWeightsByBin,
int *pCountByBin,
int totalCount
);
// Sumup.h - Delta-sparse histogram accumulation
template <typename FloatT, int numBits>
void SumupDeltaSparse(
int numBins,
unsigned char *pData,
int *pDeltas,
FloatT *pSampleOutputs,
double *pSampleOutputWeights,
int numDocs,
int *pIndices,
int totalCount,
double *pSumTargetsByBin,
double *pSumWeightsByBin,
int *pCountByBin,
double sumTargets,
double sumWeights
);
// SumupOneBit.h - Binary feature histogram accumulation
template <typename FloatT>
void SumupOneBit(
unsigned char *pData,
FloatT *pSampleOutputs,
double *pSampleOutputWeights,
int numDocs,
int *pIndices,
double *pSumTargetsByBin,
double *pSumWeightsByBin,
int *pCountByBin,
int totalCount
);
// expand.h - C-linkage exports
extern "C" {
EXPORT_API(void) C_Sumup_float(int numBits, int numBins, ...);
EXPORT_API(void) C_Sumup_double(int numBits, int numBins, ...);
EXPORT_API(void) C_SumupDeltaSparse_float(int numBits, int numBins, ...);
EXPORT_API(void) C_SumupDeltaSparse_double(int numBits, int numBins, ...);
EXPORT_API(void) C_SumupSegment_float(int numBits, int numBins, ...);
EXPORT_API(void) C_SumupSegment_double(int numBits, int numBins, ...);
}
Import
// P/Invoke declarations from managed ML.NET code
[DllImport("FastTreeNative")]
private static extern void C_Sumup_float(
int numBits, int numBins, byte* pData,
float* pSampleOutputs, double* pSampleOutputWeights,
int numDocs, int* pIndices,
double* pSumTargetsByBin, double* pSumWeightsByBin,
int* pCountByBin, int totalCount);
[DllImport("FastTreeNative")]
private static extern void C_Sumup_double(
int numBits, int numBins, byte* pData,
double* pSampleOutputs, double* pSampleOutputWeights,
int numDocs, int* pIndices,
double* pSumTargetsByBin, double* pSumWeightsByBin,
int* pCountByBin, int totalCount);
[DllImport("FastTreeNative")]
private static extern void C_SumupDeltaSparse_float(
int numBits, int numBins, byte* pData, int* pDeltas,
float* pSampleOutputs, double* pSampleOutputWeights,
int numDocs, int* pIndices, int totalCount,
double* pSumTargetsByBin, double* pSumWeightsByBin,
int* pCountByBin, double sumTargets, double sumWeights);
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| numBits | int | Yes | Bit-width of feature encoding (1, 4, 8, 16, or 32). Controls template dispatch. |
| numBins | int | Yes | Number of histogram bins for this feature. |
| pData | unsigned char* | Yes | Pointer to the packed feature data column. Encoding depends on numBits. |
| pSampleOutputs | FloatT* | Yes | Gradient (target) values for each document. Float or double precision. |
| pSampleOutputWeights | double* | No | Per-document weights. NULL if unweighted training. |
| numDocs | int | Yes | Number of documents in the current leaf node. |
| pIndices | int* | No | Permutation array mapping local document indices to global positions. NULL for _noindices variants. |
| pDeltas | int* | Sparse only | Delta-encoded row indices for sparse features. |
| totalCount | int | Yes | Total document count (used for overflow bin computation in sparse mode). |
| sumTargets | double | Sparse only | Pre-computed sum of all targets (for default bin calculation in sparse accumulation). |
| sumWeights | double | Sparse only | Pre-computed sum of all weights (for default bin calculation in sparse accumulation). |
Outputs
| Name | Type | Description |
|---|---|---|
| pSumTargetsByBin | double* | Accumulated gradient sum per histogram bin. |
| pSumWeightsByBin | double* | Accumulated weight sum per histogram bin. |
| pCountByBin | int* | Document count per histogram bin. |
Usage Examples
// Dense 8-bit feature histogram accumulation with float outputs
int numBits = 8;
int numBins = 256;
unsigned char featureData[numDocs]; // 8-bit bin indices
float gradients[numDocs]; // per-document gradients
double weights[numDocs]; // per-document weights
int indices[leafDocCount]; // document indices in current leaf
double sumTargetsByBin[numBins] = {0};
double sumWeightsByBin[numBins] = {0};
int countByBin[numBins] = {0};
C_Sumup_float(
numBits, numBins, featureData,
gradients, weights,
leafDocCount, indices,
sumTargetsByBin, sumWeightsByBin,
countByBin, totalDocCount
);
// After call, sumTargetsByBin[b] contains the total gradient
// for all documents in the leaf whose feature falls in bin b.
// The optimal split is found by scanning bins for the best gain.
// Delta-sparse 4-bit feature with double outputs
int numBits = 4;
int numBins = 16;
unsigned char sparseData[nnz]; // packed 4-bit bin values
int deltas[nnz]; // delta-encoded row offsets
double gradients[numDocs];
double precomputedSumTargets = 42.5; // sum over all docs
double precomputedSumWeights = 100.0;
C_SumupDeltaSparse_double(
numBits, numBins, sparseData, deltas,
gradients, nullptr, // no weights
leafDocCount, indices, totalDocCount,
sumTargetsByBin, sumWeightsByBin,
countByBin,
precomputedSumTargets, precomputedSumWeights
);