Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Dotnet Machinelearning FastTree Segment

From Leeroopedia


Knowledge Sources
Domains Machine Learning, Dynamic Programming, SIMD Optimization
Last Updated 2026-02-09 12:00 GMT

Overview

Optimal segment path finding using Viterbi-like dynamic programming with SIMD acceleration for variable-length bit encoding of tree node feature bins.

Description

The segment encoding module (segment.cpp, 334 lines) implements a dynamic programming algorithm to find the optimal cost partitioning of feature histogram bins into variable-length segments. In gradient boosted tree training, feature values are quantized into bins, and segment encoding further compresses these bins by grouping adjacent bins that have similar statistics into segments with a shared representative value. This reduces the memory footprint and speeds up subsequent split finding.

The algorithm is structurally analogous to the Viterbi algorithm used in hidden Markov models: it maintains a cost matrix where each entry represents the minimum cost of encoding the first k bins using a particular segment configuration. The transition costs capture the distortion introduced by merging adjacent bins into a single segment. The algorithm traces back through the optimal path to recover the segment boundaries.

Multiple variants are provided, parameterized by the maximum number of segments allowed:

  • C_SegmentFindOptimalPath7 -- Up to 7 segments (3-bit encoding)
  • C_SegmentFindOptimalPath15 -- Up to 15 segments (4-bit encoding)
  • C_SegmentFindOptimalPath21 -- Up to 21 segments (approximately 4.4-bit encoding)

Cost-only variants are also exported for use cases where only the optimal cost is needed without the full path:

  • C_SegmentFindOptimalCost15 -- Optimal cost for up to 15 segments
  • C_SegmentFindOptimalCost31 -- Optimal cost for up to 31 segments (5-bit encoding)

The inner loops use SIMD intrinsics (SSE2/AVX where available) to parallelize the cost accumulation across multiple segment candidates simultaneously, achieving substantial speedups on modern processors.

Usage

Segment encoding is used during the feature pre-processing phase of FastTree training. After initial histogram bin boundaries are determined, the segment optimizer finds the best way to merge bins into a smaller number of segments. This is particularly useful when the number of unique feature values is large but many adjacent bins have similar gradient statistics, allowing aggressive compression without significant loss of split quality. The managed layer calls these functions to determine both the optimal segment boundaries (path variants) and to evaluate whether segmentation provides sufficient compression benefit (cost variants).

Code Reference

Source Location

Signature

extern "C" {

EXPORT_API(void) C_SegmentFindOptimalPath15(
    unsigned long *pBins,
    unsigned long long bits,
    int offset,
    int numBins,
    float *pOutputs,
    float *pCost,
    int *pPath,
    int *pBinsOpt
);

EXPORT_API(void) C_SegmentFindOptimalPath21(
    unsigned long *pBins,
    unsigned long long bits,
    int offset,
    int numBins,
    float *pOutputs,
    float *pCost,
    int *pPath,
    int *pBinsOpt
);

EXPORT_API(void) C_SegmentFindOptimalPath7(
    unsigned long *pBins,
    unsigned long long bits,
    int offset,
    int numBins,
    float *pOutputs,
    float *pCost,
    int *pPath,
    int *pBinsOpt
);

EXPORT_API(void) C_SegmentFindOptimalCost15(
    unsigned long *pBins,
    unsigned long long bits,
    int offset,
    int numBins,
    float *pOutputs,
    float *pCost
);

EXPORT_API(void) C_SegmentFindOptimalCost31(
    unsigned long *pBins,
    unsigned long long bits,
    int offset,
    int numBins,
    float *pOutputs,
    float *pCost
);

}

Import

// P/Invoke declarations from managed ML.NET code
[DllImport("FastTreeNative")]
private static extern void C_SegmentFindOptimalPath15(
    ulong* pBins, ulong bits, int offset, int numBins,
    float* pOutputs, float* pCost, int* pPath, int* pBinsOpt);

[DllImport("FastTreeNative")]
private static extern void C_SegmentFindOptimalPath21(
    ulong* pBins, ulong bits, int offset, int numBins,
    float* pOutputs, float* pCost, int* pPath, int* pBinsOpt);

[DllImport("FastTreeNative")]
private static extern void C_SegmentFindOptimalPath7(
    ulong* pBins, ulong bits, int offset, int numBins,
    float* pOutputs, float* pCost, int* pPath, int* pBinsOpt);

[DllImport("FastTreeNative")]
private static extern void C_SegmentFindOptimalCost15(
    ulong* pBins, ulong bits, int offset, int numBins,
    float* pOutputs, float* pCost);

[DllImport("FastTreeNative")]
private static extern void C_SegmentFindOptimalCost31(
    ulong* pBins, ulong bits, int offset, int numBins,
    float* pOutputs, float* pCost);

I/O Contract

Inputs

Name Type Required Description
pBins unsigned long* Yes Array of bin boundary descriptors. Each entry encodes the document count or cumulative statistic for a histogram bin.
bits unsigned long long Yes Bitmask controlling which bins are eligible for segment boundaries. Allows constraining the segmentation to respect pre-existing structural constraints.
offset int Yes Starting offset into the bin array, allowing segmentation of a sub-range of the full histogram.
numBins int Yes Total number of bins to segment. Determines the length of the dynamic programming table.
pOutputs float* Yes Per-bin output values (e.g., mean gradient per bin). Used to compute the distortion cost of merging bins into segments.
pCost float* Yes (in/out) Cost array. On input, may contain initial costs. On output, contains the optimal cost at each position.

Outputs

Name Type Description
pCost float* Updated with the minimum encoding cost for the optimal segmentation at each bin position.
pPath int* Backtrace path array. pPath[i] gives the index of the previous segment boundary for the optimal path ending at bin i. Used to reconstruct the full segmentation. (Path variants only.)
pBinsOpt int* Optimal segment-to-bin mapping. After backtracing, pBinsOpt[s] gives the starting bin index for segment s. (Path variants only.)

Usage Examples

// Find optimal 15-segment encoding for a 256-bin feature histogram
int numBins = 256;
unsigned long bins[256];           // bin statistics
float outputs[256];                // per-bin mean gradient
float cost[256];                   // cost workspace
int path[256];                     // backtrace path
int binsOpt[15];                   // resulting segment boundaries

// Initialize bins and outputs from histogram accumulation
// ...

unsigned long long bits = ~0ULL;   // all bins eligible as boundaries
int offset = 0;

C_SegmentFindOptimalPath15(
    bins, bits, offset, numBins,
    outputs, cost, path, binsOpt
);

// binsOpt now contains the starting bin for each of up to 15 segments.
// cost[numBins-1] contains the total distortion cost of the segmentation.
// Cost-only evaluation: check whether 31-segment encoding is beneficial
float cost31[numBins];
C_SegmentFindOptimalCost31(
    bins, bits, offset, numBins,
    outputs, cost31
);

float totalCost = cost31[numBins - 1];
if (totalCost < compressionThreshold) {
    // Segmentation is worthwhile; proceed with path finding
    C_SegmentFindOptimalPath15(bins, bits, offset, numBins,
                               outputs, cost, path, binsOpt);
}

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment