Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Dotnet Machinelearning OneDalAlgorithms

From Leeroopedia
Revision as of 14:49, 16 February 2026 by Admin (talk | contribs) (Auto-imported from implementations/Dotnet_Machinelearning_OneDalAlgorithms.md)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)


Knowledge Sources
Domains Machine Learning, Native Interop, Hardware Acceleration
Last Updated 2026-02-09 12:00 GMT

Overview

C++ wrapper library exposing Intel oneDAL (oneAPI Data Analytics Library) accelerated implementations of Decision Forest, Logistic Regression, and Ridge Regression algorithms for consumption by the ML.NET managed runtime via P/Invoke.

Description

OneDalAlgorithms.cpp provides a native bridge between ML.NET's managed C# training pipelines and Intel's oneDAL library, which delivers hardware-optimized implementations of core machine learning algorithms. The file exports five primary functions that cover three algorithm families:

  • Decision Forest (classification and regression) -- Trains an ensemble of decision trees using oneDAL's optimized parallel tree construction. Results are extracted via custom RegressorNodeVisitor and ClassifierNodeVisitor helper classes that traverse the trained tree structure and serialize node data (split features, thresholds, leaf values) back to managed memory.
  • Logistic Regression (L-BFGS optimizer) -- Trains a logistic regression model using the Limited-memory Broyden-Fletcher-Goldfarb-Shanno optimization algorithm, which is well-suited for large feature spaces where full Hessian computation is impractical.
  • Ridge Regression (online/streaming) -- Supports incremental training through a two-phase API: ridgeRegressionOnlineCompute accepts data in batches (partial results are accumulated internally), and ridgeRegressionOnlineFinalize merges partial results to produce the final model coefficients.

All exported functions are templated to support both float (single precision) and double (double precision) data types, allowing the managed layer to choose the appropriate precision for the workload.

Usage

Use these native functions when training ML.NET models on Intel hardware where oneDAL acceleration is available. The managed trainers (e.g., FastForestRegressionTrainer, FastForestBinaryClassificationTrainer) automatically detect oneDAL availability and dispatch to these native functions when beneficial. This is most effective for:

  • Large datasets where vectorized computation yields significant speedups
  • Decision forest training with many trees and deep splits
  • Ridge regression on streaming data that arrives in batches

Code Reference

Source Location

Signature

// Decision Forest Regression
EXPORT_API(void) decisionForestRegressionCompute(
    int numColumns,
    int numRows,
    float* dataPtr,
    float* labelsPtr,
    int numTrees,
    int maxTreeDepth,
    int minObservationsInLeafNode,
    int maxBins,
    int seed,
    float* featureImportancePtr,
    int* treeNodeCount,
    float** treeNodeSplitValues,
    int** treeNodeFeatureIndices,
    float** treeNodeLeafValues,
    int** treeNodeLeftChildren,
    int** treeNodeRightChildren
);

// Decision Forest Classification
EXPORT_API(void) decisionForestClassificationCompute(
    int numColumns,
    int numRows,
    float* dataPtr,
    float* labelsPtr,
    int numClasses,
    int numTrees,
    int maxTreeDepth,
    int minObservationsInLeafNode,
    int maxBins,
    int seed,
    float* featureImportancePtr,
    int* treeNodeCount,
    float** treeNodeSplitValues,
    int** treeNodeFeatureIndices,
    float** treeNodeLeafValues,
    int** treeNodeLeftChildren,
    int** treeNodeRightChildren
);

// Logistic Regression (L-BFGS)
EXPORT_API(void) logisticRegressionLBFGSCompute(
    int numColumns,
    int numRows,
    float* dataPtr,
    float* labelsPtr,
    int numClasses,
    float l1Regularization,
    float l2Regularization,
    int maxIterations,
    float* weightsPtr,
    float* biasPtr
);

// Ridge Regression Online (batch)
EXPORT_API(void*) ridgeRegressionOnlineCompute(
    int numColumns,
    int numRows,
    float* dataPtr,
    float* labelsPtr,
    float l2Regularization,
    void* partialResult
);

// Ridge Regression Online (finalize)
EXPORT_API(void) ridgeRegressionOnlineFinalize(
    int numColumns,
    void* partialResult,
    float* weightsPtr,
    float* biasPtr
);

Import

// P/Invoke declarations (managed side)
[DllImport("OneDalNative", EntryPoint = "decisionForestRegressionCompute")]
internal static extern void DecisionForestRegressionCompute(
    int numColumns, int numRows,
    IntPtr dataPtr, IntPtr labelsPtr,
    int numTrees, int maxTreeDepth,
    int minObservationsInLeafNode, int maxBins, int seed,
    IntPtr featureImportancePtr,
    IntPtr treeNodeCount, IntPtr treeNodeSplitValues,
    IntPtr treeNodeFeatureIndices, IntPtr treeNodeLeafValues,
    IntPtr treeNodeLeftChildren, IntPtr treeNodeRightChildren);

[DllImport("OneDalNative", EntryPoint = "ridgeRegressionOnlineCompute")]
internal static extern IntPtr RidgeRegressionOnlineCompute(
    int numColumns, int numRows,
    IntPtr dataPtr, IntPtr labelsPtr,
    float l2Regularization, IntPtr partialResult);

[DllImport("OneDalNative", EntryPoint = "ridgeRegressionOnlineFinalize")]
internal static extern void RidgeRegressionOnlineFinalize(
    int numColumns, IntPtr partialResult,
    IntPtr weightsPtr, IntPtr biasPtr);

I/O Contract

Inputs

Name Type Required Description
numColumns int Yes Number of feature columns in the training data
numRows int Yes Number of rows (instances) in the current data batch
dataPtr float*/double* Yes Pointer to row-major feature matrix of size numRows x numColumns
labelsPtr float*/double* Yes Pointer to label array of size numRows
numTrees int Yes (forest) Number of trees to build in the ensemble
maxTreeDepth int Yes (forest) Maximum depth of each decision tree
minObservationsInLeafNode int Yes (forest) Minimum number of samples required at a leaf node
maxBins int Yes (forest) Maximum number of bins for histogram-based splitting
seed int Yes (forest) Random seed for reproducibility
numClasses int Yes (classification) Number of target classes
l1Regularization float Yes (logistic) L1 regularization coefficient for sparsity
l2Regularization float Yes (logistic/ridge) L2 regularization coefficient for weight shrinkage
maxIterations int Yes (logistic) Maximum number of L-BFGS iterations
partialResult void* No (ridge) Pointer to accumulated partial results from previous batches; NULL for first batch

Outputs

Name Type Description
featureImportancePtr float* Array of feature importance scores (forest algorithms)
treeNodeCount int* Number of nodes per tree in the ensemble
treeNodeSplitValues float** Split threshold values for each internal node
treeNodeFeatureIndices int** Feature index used for splitting at each internal node
treeNodeLeafValues float** Predicted values at leaf nodes
treeNodeLeftChildren int** Left child node indices for tree traversal
treeNodeRightChildren int** Right child node indices for tree traversal
weightsPtr float* Trained model weight vector (logistic/ridge regression)
biasPtr float* Trained model bias/intercept term (logistic/ridge regression)
return (ridgeOnlineCompute) void* Opaque pointer to accumulated partial results for subsequent batches

Helper Classes

RegressorNodeVisitor

Traverses trained regression decision trees produced by oneDAL and extracts node information (split feature, split value, leaf prediction) into flat arrays suitable for marshalling back to managed code.

ClassifierNodeVisitor

Traverses trained classification decision trees and extracts node information including class probability distributions at leaf nodes.

Both visitors implement the oneDAL TreeNodeVisitor interface and are invoked by the model.traverseDF() method after training completes.

Usage Examples

// Training a decision forest regressor via the managed trainer
// (which internally calls decisionForestRegressionCompute)
var pipeline = mlContext.Transforms.Concatenate("Features", featureColumns)
    .Append(mlContext.Regression.Trainers.FastForest(
        numberOfTrees: 100,
        maximumTreeDepth: 16,
        minimumExampleCountPerLeaf: 5));

var model = pipeline.Fit(trainingData);

// The FastForest trainer detects Intel oneDAL availability
// and dispatches to the native OneDalAlgorithms functions
// for hardware-accelerated training.
// Ridge regression with streaming batches
// (internally uses ridgeRegressionOnlineCompute + ridgeRegressionOnlineFinalize)
var pipeline = mlContext.Transforms.Concatenate("Features", featureColumns)
    .Append(mlContext.Regression.Trainers.OnlineRidgeRegression(
        l2Regularization: 0.1f));

// Each batch calls ridgeRegressionOnlineCompute with partial results
// Final call to ridgeRegressionOnlineFinalize produces the model
var model = pipeline.Fit(trainingData);

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment