Implementation:Dotnet Machinelearning SymSgdNative
| Knowledge Sources | |
|---|---|
| Domains | Machine Learning, Native Interop, Parallel Computing, Binary Classification |
| Last Updated | 2026-02-09 12:00 GMT |
Overview
Native C++ implementation of Symmetric Stochastic Gradient Descent (SymSGD) for binary classification, using frequency-based feature partitioning to enable lock-free parallel training across multiple CPU threads while minimizing communication overhead.
Description
SymSgdNative.cpp and SymSgdNative.h implement a parallel SGD variant specifically designed for sparse binary classification problems. The key innovation is frequency-based feature partitioning: features are divided into frequent (appearing in many training instances) and non-frequent groups. Each parallel learner maintains a local copy of the model weights for frequent features but shares a global model for non-frequent features.
The implementation consists of several phases:
- Feature remapping (ComputeRemapping, RemapInstances): Analyzes the training data to identify the most frequently occurring features. These features are assigned contiguous indices at the front of the weight vector, enabling efficient local model allocation. An unordered map and direct-address table provide O(1) lookups between original and remapped feature indices.
- State initialization (InitializeState): Allocates the SymSGDState struct containing per-learner SymSGD instances, the frequency maps, and global training counters. Each learner gets its own copy of the frequent-feature weights.
- Training (LearnAll): The main training loop that distributes instances across learners. Each learner calls LearnLocalModel to perform SGD updates on its local model. After processing a configurable number of local iterations, the learners synchronize via Reduction -- averaging their local frequent-feature weights back into the global model.
- Hyperparameter tuning (TuneAlpha, TuneNumLocIter): Automatic selection of the learning rate (alpha) and the number of local iterations between reductions, based on the loss function and convergence behavior.
- Finalization (MapBackWeightVector, DeallocateSequentially): Restores the original feature ordering in the weight vector and releases all native memory.
Usage
SymSGD is used for binary classification tasks in ML.NET where:
- The feature space is sparse and high-dimensional (e.g., text classification, click prediction)
- Training data is large enough to benefit from parallel execution
- A linear classifier is appropriate for the problem
The managed SymbolicSgdLogisticRegressionBinaryTrainer dispatches to this native code for the core training loop.
Code Reference
Source Location
- Repository: Dotnet_Machinelearning
- File: src/Native/SymSgdNative/SymSgdNative.cpp
- Lines: 1-489
- File: src/Native/SymSgdNative/SymSgdNative.h
- Lines: 1-148
Signature
// --- Exported functions from SymSgdNative.cpp ---
// Main training loop: processes all instances across parallel learners
EXPORT_API(void) LearnAll(
int numInstances,
int* instSizes, // Number of non-zero features per instance
int** instIndices, // Feature indices per instance (sparse)
float** instValues, // Feature values per instance (sparse)
float* instLabels, // Binary labels (+1 / -1)
int numThreads,
int numFreqFeatures,
int numFeatures,
float l2Const, // L2 regularization constant
float piw, // Positive instance weight
float* weightVector, // Model weights (in/out)
float* bias, // Bias term (in/out)
void** state // Opaque state handle (in/out)
);
// Restore original feature ordering in weight vector
EXPORT_API(void) MapBackWeightVector(void* state);
// Release all native memory
EXPORT_API(void) DeallocateSequentially(void* state);
// --- SymSGD class from SymSgdNative.h ---
class SymSGD {
public:
// Perform local SGD update for one instance
void LearnLocalModel(
int instSize,
int* instIndices,
float* instValues,
float instLabel,
float alpha, // Learning rate
float l2Const,
float piw,
float* globModel // Global weight vector
);
// Copy global weights into local model
void ResetModel(
float bias,
float* globModel,
float weightScaling
);
// Average local model back into global model
void Reduction(
float* globModel,
float* bias,
float* weightScaling
);
};
// --- SymSGDState struct ---
struct SymSGDState {
int NumLearners;
long long TotalInstancesProcessed;
SymSGD* Learners; // Array of parallel learners
std::unordered_map<int, int> FreqFeatUnorderedMap; // Original -> remapped index
int* FreqFeatDirectMap; // Direct-address remapped -> original
int NumFrequentFeatures;
int PassIteration;
float WeightScaling;
};
Import
// P/Invoke declarations (managed side)
[DllImport("SymSgdNative")]
internal static extern void LearnAll(
int numInstances,
IntPtr instSizes,
IntPtr instIndices,
IntPtr instValues,
IntPtr instLabels,
int numThreads,
int numFreqFeatures,
int numFeatures,
float l2Const,
float piw,
IntPtr weightVector,
ref float bias,
ref IntPtr state);
[DllImport("SymSgdNative")]
internal static extern void MapBackWeightVector(IntPtr state);
[DllImport("SymSgdNative")]
internal static extern void DeallocateSequentially(IntPtr state);
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| numInstances | int | Yes | Number of training instances in this batch |
| instSizes | int* | Yes | Array of length numInstances; each entry is the number of non-zero features for that instance |
| instIndices | int** | Yes | Array of int arrays; instIndices[i] contains the feature indices for instance i |
| instValues | float** | Yes | Array of float arrays; instValues[i] contains the corresponding feature values |
| instLabels | float* | Yes | Array of binary labels (+1.0 or -1.0) of length numInstances |
| numThreads | int | Yes | Number of parallel learner threads |
| numFreqFeatures | int | Yes | Number of features classified as frequent (will be replicated per learner) |
| numFeatures | int | Yes | Total number of features in the feature space |
| l2Const | float | Yes | L2 regularization constant controlling weight decay |
| piw | float | Yes | Positive instance weight for handling class imbalance |
| weightVector | float* | Yes | Model weight vector of length numFeatures (input: current weights; output: updated weights) |
| bias | float* | Yes | Pointer to model bias term (input/output) |
| state | void** | Yes | Pointer to opaque state handle; NULL on first call, initialized by LearnAll |
Outputs
| Name | Type | Description |
|---|---|---|
| weightVector | float* | Updated model weights after training (written in-place) |
| bias | float* | Updated bias term after training (written in-place) |
| state | void** | Opaque state handle for subsequent calls; contains learner instances, frequency maps, and training counters |
Internal Helper Functions
| Function | Description |
|---|---|
| ComputeRemapping | Scans all training instances to count feature frequencies; identifies the top numFreqFeatures features and builds the remapping tables (unordered map and direct-address table) |
| RemapInstances | Rewrites instance feature indices from original space to remapped space where frequent features are contiguous at indices [0, numFreqFeatures) |
| MaxPossibleAlpha | Computes the maximum learning rate that maintains convergence guarantees based on feature norms and instance count |
| TuneAlpha | Searches for the optimal learning rate by evaluating loss on a sample of instances at different alpha values |
| TuneNumLocIter | Determines how many local SGD iterations each learner should perform before synchronizing, balancing convergence speed against communication overhead |
| InitializeState | Allocates and initializes the SymSGDState structure, including per-learner SymSGD instances with their local weight copies |
| Loss | Computes the logistic loss with L2 regularization over a set of instances; used for hyperparameter tuning and convergence monitoring |
Training Flow
The SymSGD training proceeds through the following stages:
- Initialization: On the first call to LearnAll (state == NULL), ComputeRemapping analyzes feature frequencies, RemapInstances rewrites the sparse data, and InitializeState allocates learner instances.
- Hyperparameter tuning: TuneAlpha and TuneNumLocIter select the learning rate and local iteration count based on a sample of the training data.
- Parallel training: Each of the numThreads learners receives a partition of the training instances. For each local iteration:
- The learner calls ResetModel to copy the current global weights for frequent features into its local model
- The learner processes its assigned instances via LearnLocalModel, updating only local weights for frequent features and global weights for non-frequent features
- After the configured number of local iterations, Reduction averages the local frequent-feature weights back into the global model
- Weight restoration: After all passes complete, MapBackWeightVector reverses the feature remapping so that the weight vector aligns with the original feature indices.
- Cleanup: DeallocateSequentially frees all native memory associated with the state.
Usage Examples
// Binary classification with SymSGD
var pipeline = mlContext.Transforms.Text
.FeaturizeText("Features", "ReviewText")
.Append(mlContext.BinaryClassification.Trainers
.SymbolicSgdLogisticRegression(
labelColumnName: "Sentiment",
featureColumnName: "Features",
numberOfThreads: 4,
numberOfIterations: 50,
l2Regularization: 1e-4f));
// The trainer internally calls LearnAll for each training pass
// Feature partitioning is automatic based on frequency analysis
var model = pipeline.Fit(trainingData);
// Predict on new data
var predictions = model.Transform(testData);
// SymSGD is particularly effective for sparse text classification
// where a small fraction of features (common words) appear in
// many instances, making the frequency-based partitioning strategy
// highly efficient.
var dataView = mlContext.Data.LoadFromTextFile<SentimentData>(
path: "reviews.tsv", hasHeader: true);
// With 100,000+ features from text featurization,
// only ~500-1000 will be classified as "frequent"
// and replicated across parallel learners.