Implementation:Dotnet Machinelearning SymSgdNative

Knowledge Sources	Dotnet_Machinelearning Parallel SGD: When does averaging help?
Domains	Machine Learning, Native Interop, Parallel Computing, Binary Classification
Last Updated	2026-02-09 12:00 GMT

Overview

Native C++ implementation of Symmetric Stochastic Gradient Descent (SymSGD) for binary classification, using frequency-based feature partitioning to enable lock-free parallel training across multiple CPU threads while minimizing communication overhead.

Description

SymSgdNative.cpp and SymSgdNative.h implement a parallel SGD variant specifically designed for sparse binary classification problems. The key innovation is frequency-based feature partitioning: features are divided into frequent (appearing in many training instances) and non-frequent groups. Each parallel learner maintains a local copy of the model weights for frequent features but shares a global model for non-frequent features.

The implementation consists of several phases:

Feature remapping (ComputeRemapping, RemapInstances): Analyzes the training data to identify the most frequently occurring features. These features are assigned contiguous indices at the front of the weight vector, enabling efficient local model allocation. An unordered map and direct-address table provide O(1) lookups between original and remapped feature indices.

State initialization (InitializeState): Allocates the SymSGDState struct containing per-learner SymSGD instances, the frequency maps, and global training counters. Each learner gets its own copy of the frequent-feature weights.

Training (LearnAll): The main training loop that distributes instances across learners. Each learner calls LearnLocalModel to perform SGD updates on its local model. After processing a configurable number of local iterations, the learners synchronize via Reduction -- averaging their local frequent-feature weights back into the global model.

Hyperparameter tuning (TuneAlpha, TuneNumLocIter): Automatic selection of the learning rate (alpha) and the number of local iterations between reductions, based on the loss function and convergence behavior.

Finalization (MapBackWeightVector, DeallocateSequentially): Restores the original feature ordering in the weight vector and releases all native memory.

Usage

SymSGD is used for binary classification tasks in ML.NET where:

The feature space is sparse and high-dimensional (e.g., text classification, click prediction)
Training data is large enough to benefit from parallel execution
A linear classifier is appropriate for the problem

The managed SymbolicSgdLogisticRegressionBinaryTrainer dispatches to this native code for the core training loop.

Code Reference

Source Location

Repository: Dotnet_Machinelearning
File: src/Native/SymSgdNative/SymSgdNative.cpp
Lines: 1-489
File: src/Native/SymSgdNative/SymSgdNative.h
Lines: 1-148

Signature

// --- Exported functions from SymSgdNative.cpp ---

// Main training loop: processes all instances across parallel learners
EXPORT_API(void) LearnAll(
    int numInstances,
    int* instSizes,          // Number of non-zero features per instance
    int** instIndices,       // Feature indices per instance (sparse)
    float** instValues,      // Feature values per instance (sparse)
    float* instLabels,       // Binary labels (+1 / -1)
    int numThreads,
    int numFreqFeatures,
    int numFeatures,
    float l2Const,           // L2 regularization constant
    float piw,               // Positive instance weight
    float* weightVector,     // Model weights (in/out)
    float* bias,             // Bias term (in/out)
    void** state             // Opaque state handle (in/out)
);

// Restore original feature ordering in weight vector
EXPORT_API(void) MapBackWeightVector(void* state);

// Release all native memory
EXPORT_API(void) DeallocateSequentially(void* state);


// --- SymSGD class from SymSgdNative.h ---

class SymSGD {
public:
    // Perform local SGD update for one instance
    void LearnLocalModel(
        int instSize,
        int* instIndices,
        float* instValues,
        float instLabel,
        float alpha,          // Learning rate
        float l2Const,
        float piw,
        float* globModel      // Global weight vector
    );

    // Copy global weights into local model
    void ResetModel(
        float bias,
        float* globModel,
        float weightScaling
    );

    // Average local model back into global model
    void Reduction(
        float* globModel,
        float* bias,
        float* weightScaling
    );
};

// --- SymSGDState struct ---

struct SymSGDState {
    int NumLearners;
    long long TotalInstancesProcessed;
    SymSGD* Learners;                          // Array of parallel learners
    std::unordered_map<int, int> FreqFeatUnorderedMap;  // Original -> remapped index
    int* FreqFeatDirectMap;                    // Direct-address remapped -> original
    int NumFrequentFeatures;
    int PassIteration;
    float WeightScaling;
};

Import

// P/Invoke declarations (managed side)
[DllImport("SymSgdNative")]
internal static extern void LearnAll(
    int numInstances,
    IntPtr instSizes,
    IntPtr instIndices,
    IntPtr instValues,
    IntPtr instLabels,
    int numThreads,
    int numFreqFeatures,
    int numFeatures,
    float l2Const,
    float piw,
    IntPtr weightVector,
    ref float bias,
    ref IntPtr state);

[DllImport("SymSgdNative")]
internal static extern void MapBackWeightVector(IntPtr state);

[DllImport("SymSgdNative")]
internal static extern void DeallocateSequentially(IntPtr state);

I/O Contract

Inputs

Name	Type	Required	Description
numInstances	int	Yes	Number of training instances in this batch
instSizes	int*	Yes	Array of length numInstances; each entry is the number of non-zero features for that instance
instIndices	int**	Yes	Array of int arrays; instIndices[i] contains the feature indices for instance i
instValues	float**	Yes	Array of float arrays; instValues[i] contains the corresponding feature values
instLabels	float*	Yes	Array of binary labels (+1.0 or -1.0) of length numInstances
numThreads	int	Yes	Number of parallel learner threads
numFreqFeatures	int	Yes	Number of features classified as frequent (will be replicated per learner)
numFeatures	int	Yes	Total number of features in the feature space
l2Const	float	Yes	L2 regularization constant controlling weight decay
piw	float	Yes	Positive instance weight for handling class imbalance
weightVector	float*	Yes	Model weight vector of length numFeatures (input: current weights; output: updated weights)
bias	float*	Yes	Pointer to model bias term (input/output)
state	void**	Yes	Pointer to opaque state handle; NULL on first call, initialized by LearnAll

Outputs

Name	Type	Description
weightVector	float*	Updated model weights after training (written in-place)
bias	float*	Updated bias term after training (written in-place)
state	void**	Opaque state handle for subsequent calls; contains learner instances, frequency maps, and training counters

Internal Helper Functions

Function	Description
ComputeRemapping	Scans all training instances to count feature frequencies; identifies the top numFreqFeatures features and builds the remapping tables (unordered map and direct-address table)
RemapInstances	Rewrites instance feature indices from original space to remapped space where frequent features are contiguous at indices [0, numFreqFeatures)
MaxPossibleAlpha	Computes the maximum learning rate that maintains convergence guarantees based on feature norms and instance count
TuneAlpha	Searches for the optimal learning rate by evaluating loss on a sample of instances at different alpha values
TuneNumLocIter	Determines how many local SGD iterations each learner should perform before synchronizing, balancing convergence speed against communication overhead
InitializeState	Allocates and initializes the SymSGDState structure, including per-learner SymSGD instances with their local weight copies
Loss	Computes the logistic loss with L2 regularization over a set of instances; used for hyperparameter tuning and convergence monitoring

Training Flow

The SymSGD training proceeds through the following stages:

Initialization: On the first call to LearnAll (state == NULL), ComputeRemapping analyzes feature frequencies, RemapInstances rewrites the sparse data, and InitializeState allocates learner instances.

Hyperparameter tuning: TuneAlpha and TuneNumLocIter select the learning rate and local iteration count based on a sample of the training data.

Parallel training: Each of the numThreads learners receives a partition of the training instances. For each local iteration:
- The learner calls ResetModel to copy the current global weights for frequent features into its local model
- The learner processes its assigned instances via LearnLocalModel, updating only local weights for frequent features and global weights for non-frequent features
- After the configured number of local iterations, Reduction averages the local frequent-feature weights back into the global model

Weight restoration: After all passes complete, MapBackWeightVector reverses the feature remapping so that the weight vector aligns with the original feature indices.

Cleanup: DeallocateSequentially frees all native memory associated with the state.

Usage Examples

// Binary classification with SymSGD
var pipeline = mlContext.Transforms.Text
    .FeaturizeText("Features", "ReviewText")
    .Append(mlContext.BinaryClassification.Trainers
        .SymbolicSgdLogisticRegression(
            labelColumnName: "Sentiment",
            featureColumnName: "Features",
            numberOfThreads: 4,
            numberOfIterations: 50,
            l2Regularization: 1e-4f));

// The trainer internally calls LearnAll for each training pass
// Feature partitioning is automatic based on frequency analysis
var model = pipeline.Fit(trainingData);

// Predict on new data
var predictions = model.Transform(testData);

// SymSGD is particularly effective for sparse text classification
// where a small fraction of features (common words) appear in
// many instances, making the frequency-based partitioning strategy
// highly efficient.
var dataView = mlContext.Data.LoadFromTextFile<SentimentData>(
    path: "reviews.tsv", hasHeader: true);

// With 100,000+ features from text featurization,
// only ~500-1000 will be classified as "frequent"
// and replicated across parallel learners.

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment