Implementation:Dotnet Machinelearning FastTree Segment
| Knowledge Sources | |
|---|---|
| Domains | Machine Learning, Dynamic Programming, SIMD Optimization |
| Last Updated | 2026-02-09 12:00 GMT |
Overview
Optimal segment path finding using Viterbi-like dynamic programming with SIMD acceleration for variable-length bit encoding of tree node feature bins.
Description
The segment encoding module (segment.cpp, 334 lines) implements a dynamic programming algorithm to find the optimal cost partitioning of feature histogram bins into variable-length segments. In gradient boosted tree training, feature values are quantized into bins, and segment encoding further compresses these bins by grouping adjacent bins that have similar statistics into segments with a shared representative value. This reduces the memory footprint and speeds up subsequent split finding.
The algorithm is structurally analogous to the Viterbi algorithm used in hidden Markov models: it maintains a cost matrix where each entry represents the minimum cost of encoding the first k bins using a particular segment configuration. The transition costs capture the distortion introduced by merging adjacent bins into a single segment. The algorithm traces back through the optimal path to recover the segment boundaries.
Multiple variants are provided, parameterized by the maximum number of segments allowed:
- C_SegmentFindOptimalPath7 -- Up to 7 segments (3-bit encoding)
- C_SegmentFindOptimalPath15 -- Up to 15 segments (4-bit encoding)
- C_SegmentFindOptimalPath21 -- Up to 21 segments (approximately 4.4-bit encoding)
Cost-only variants are also exported for use cases where only the optimal cost is needed without the full path:
- C_SegmentFindOptimalCost15 -- Optimal cost for up to 15 segments
- C_SegmentFindOptimalCost31 -- Optimal cost for up to 31 segments (5-bit encoding)
The inner loops use SIMD intrinsics (SSE2/AVX where available) to parallelize the cost accumulation across multiple segment candidates simultaneously, achieving substantial speedups on modern processors.
Usage
Segment encoding is used during the feature pre-processing phase of FastTree training. After initial histogram bin boundaries are determined, the segment optimizer finds the best way to merge bins into a smaller number of segments. This is particularly useful when the number of unique feature values is large but many adjacent bins have similar gradient statistics, allowing aggressive compression without significant loss of split quality. The managed layer calls these functions to determine both the optimal segment boundaries (path variants) and to evaluate whether segmentation provides sufficient compression benefit (cost variants).
Code Reference
Source Location
- Repository: Dotnet_Machinelearning
- File: src/Native/FastTreeNative/segment.cpp
- Lines: 1-334
Signature
extern "C" {
EXPORT_API(void) C_SegmentFindOptimalPath15(
unsigned long *pBins,
unsigned long long bits,
int offset,
int numBins,
float *pOutputs,
float *pCost,
int *pPath,
int *pBinsOpt
);
EXPORT_API(void) C_SegmentFindOptimalPath21(
unsigned long *pBins,
unsigned long long bits,
int offset,
int numBins,
float *pOutputs,
float *pCost,
int *pPath,
int *pBinsOpt
);
EXPORT_API(void) C_SegmentFindOptimalPath7(
unsigned long *pBins,
unsigned long long bits,
int offset,
int numBins,
float *pOutputs,
float *pCost,
int *pPath,
int *pBinsOpt
);
EXPORT_API(void) C_SegmentFindOptimalCost15(
unsigned long *pBins,
unsigned long long bits,
int offset,
int numBins,
float *pOutputs,
float *pCost
);
EXPORT_API(void) C_SegmentFindOptimalCost31(
unsigned long *pBins,
unsigned long long bits,
int offset,
int numBins,
float *pOutputs,
float *pCost
);
}
Import
// P/Invoke declarations from managed ML.NET code
[DllImport("FastTreeNative")]
private static extern void C_SegmentFindOptimalPath15(
ulong* pBins, ulong bits, int offset, int numBins,
float* pOutputs, float* pCost, int* pPath, int* pBinsOpt);
[DllImport("FastTreeNative")]
private static extern void C_SegmentFindOptimalPath21(
ulong* pBins, ulong bits, int offset, int numBins,
float* pOutputs, float* pCost, int* pPath, int* pBinsOpt);
[DllImport("FastTreeNative")]
private static extern void C_SegmentFindOptimalPath7(
ulong* pBins, ulong bits, int offset, int numBins,
float* pOutputs, float* pCost, int* pPath, int* pBinsOpt);
[DllImport("FastTreeNative")]
private static extern void C_SegmentFindOptimalCost15(
ulong* pBins, ulong bits, int offset, int numBins,
float* pOutputs, float* pCost);
[DllImport("FastTreeNative")]
private static extern void C_SegmentFindOptimalCost31(
ulong* pBins, ulong bits, int offset, int numBins,
float* pOutputs, float* pCost);
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| pBins | unsigned long* | Yes | Array of bin boundary descriptors. Each entry encodes the document count or cumulative statistic for a histogram bin. |
| bits | unsigned long long | Yes | Bitmask controlling which bins are eligible for segment boundaries. Allows constraining the segmentation to respect pre-existing structural constraints. |
| offset | int | Yes | Starting offset into the bin array, allowing segmentation of a sub-range of the full histogram. |
| numBins | int | Yes | Total number of bins to segment. Determines the length of the dynamic programming table. |
| pOutputs | float* | Yes | Per-bin output values (e.g., mean gradient per bin). Used to compute the distortion cost of merging bins into segments. |
| pCost | float* | Yes (in/out) | Cost array. On input, may contain initial costs. On output, contains the optimal cost at each position. |
Outputs
| Name | Type | Description |
|---|---|---|
| pCost | float* | Updated with the minimum encoding cost for the optimal segmentation at each bin position. |
| pPath | int* | Backtrace path array. pPath[i] gives the index of the previous segment boundary for the optimal path ending at bin i. Used to reconstruct the full segmentation. (Path variants only.) |
| pBinsOpt | int* | Optimal segment-to-bin mapping. After backtracing, pBinsOpt[s] gives the starting bin index for segment s. (Path variants only.) |
Usage Examples
// Find optimal 15-segment encoding for a 256-bin feature histogram
int numBins = 256;
unsigned long bins[256]; // bin statistics
float outputs[256]; // per-bin mean gradient
float cost[256]; // cost workspace
int path[256]; // backtrace path
int binsOpt[15]; // resulting segment boundaries
// Initialize bins and outputs from histogram accumulation
// ...
unsigned long long bits = ~0ULL; // all bins eligible as boundaries
int offset = 0;
C_SegmentFindOptimalPath15(
bins, bits, offset, numBins,
outputs, cost, path, binsOpt
);
// binsOpt now contains the starting bin for each of up to 15 segments.
// cost[numBins-1] contains the total distortion cost of the segmentation.
// Cost-only evaluation: check whether 31-segment encoding is beneficial
float cost31[numBins];
C_SegmentFindOptimalCost31(
bins, bits, offset, numBins,
outputs, cost31
);
float totalCost = cost31[numBins - 1];
if (totalCost < compressionThreshold) {
// Segmentation is worthwhile; proceed with path finding
C_SegmentFindOptimalPath15(bins, bits, offset, numBins,
outputs, cost, path, binsOpt);
}