Implementation:Dotnet Machinelearning FactorizationMachineCore
| Knowledge Sources | |
|---|---|
| Domains | Recommender Systems, Factorization Machines, Native Interop |
| Last Updated | 2026-02-09 12:00 GMT |
Overview
Native C++ implementation of Field-Aware Factorization Machine (FFM) forward computation and AdaGrad gradient update, using SSE intrinsics for SIMD-accelerated latent factor interactions.
Description
FactorizationMachineCore.cpp implements the two computational kernels of the Field-Aware Factorization Machine (FFM) model in ML.NET. FFMs extend standard factorization machines by learning separate latent vectors for each (feature, field) pair rather than a single latent vector per feature. This native library provides SIMD-optimized routines that are called from the managed C# trainer via P/Invoke.
The file contains two exported functions:
- CalculateIntermediateVariablesNative computes the model's prediction for a single instance. It calculates the linear term (inner product of linear weights and feature values) and the latent interaction term (sum of all pairwise field-aware interactions). The function uses an intermediate buffer latentSum (of size fieldCount x fieldCount x latentDim) to accumulate partial sums that are reused during gradient computation.
- CalculateGradientAndUpdateNative computes stochastic gradients and applies AdaGrad updates to both the linear weights and the latent weight matrices. For each feature in the instance, it updates the linear weight via standard AdaGrad and then updates all latent vectors across all field projections. The gradient includes both the loss-function gradient and L2 regularization. AdaGrad's per-parameter adaptive learning rate is applied using _mm_rsqrt_ps for fast reciprocal square root.
Both functions operate on sparse feature representations where features are identified by (fieldIndex, featureIndex, featureValue) triples, making them efficient for the high-dimensional sparse data typical of click-through-rate prediction and recommendation tasks.
Usage
These functions are invoked during FFM model training in ML.NET. CalculateIntermediateVariablesNative is called during the forward pass to compute the prediction and cache intermediate values. CalculateGradientAndUpdateNative is called during the backward pass to update model parameters. The managed wrapper in FieldAwareFactorizationMachineInterface.cs handles alignment, memory pinning, and dispatching to these native implementations on .NET Standard targets.
Code Reference
Source Location
- Repository: Dotnet_Machinelearning
- File: src/Native/CpuMathNative/FactorizationMachineCore.cpp
- Lines: 1-179
Signature
EXPORT_API(void) CalculateIntermediateVariablesNative(
int fieldCount, // m: number of fields
int latentDim, // d: dimensionality of latent vectors (must be multiple of 4)
int count, // c: number of nonzero features in this instance
int * fieldIndices, // pf[c]: field index for each feature
int * featureIndices, // pi[c]: global feature index for each feature
float * featureValues, // px[c]: value of each feature
float * linearWeights, // pw[numFeatures]: linear weight per feature
float * latentWeights, // pv[numFeatures * m * d]: latent weight tensor
float * latentSum, // pq[m * m * d]: intermediate accumulator (zeroed inside)
float * response // output: scalar prediction value
);
EXPORT_API(void) CalculateGradientAndUpdateNative(
float lambdaLinear, // L2 regularization coefficient for linear weights
float lambdaLatent, // L2 regularization coefficient for latent weights
float learningRate, // base learning rate for AdaGrad
int fieldCount, // m: number of fields
int latentDim, // d: latent dimension (must be multiple of 4)
float weight, // instance weight for weighted loss
int count, // c: number of nonzero features in this instance
int * fieldIndices, // pf[c]: field index for each feature
int * featureIndices, // pi[c]: global feature index for each feature
float * featureValues, // px[c]: value of each feature
float * latentSum, // pq[m * m * d]: intermediate values from forward pass
float slope, // derivative of loss function w.r.t. prediction
float * linearWeights, // pw[numFeatures]: updated in-place
float * latentWeights, // pv[numFeatures * m * d]: updated in-place
float * linearAccumulatedSquaredGrads, // phw[numFeatures]: AdaGrad accumulators
float * latentAccumulatedSquaredGrads // phv[numFeatures * m * d]: AdaGrad accumulators
);
Import
// P/Invoke declarations from
// src/Microsoft.ML.CpuMath/FactorizationMachine/FactorizationMachineInterface.cs
using System.Runtime.InteropServices;
using System.Security;
internal static unsafe partial class FieldAwareFactorizationMachineInterface
{
private const string NativePath = "CpuMathNative";
[DllImport(NativePath), SuppressUnmanagedCodeSecurity]
private static extern void CalculateIntermediateVariablesNative(
int fieldCount, int latentDim, int count,
int* fieldIndices, int* featureIndices, float* featureValues,
float* linearWeights, float* latentWeights,
float* latentSum, float* response);
[DllImport(NativePath), SuppressUnmanagedCodeSecurity]
private static extern void CalculateGradientAndUpdateNative(
float lambdaLinear, float lambdaLatent, float learningRate,
int fieldCount, int latentDim, float weight, int count,
int* fieldIndices, int* featureIndices, float* featureValues,
float* latentSum, float slope,
float* linearWeights, float* latentWeights,
float* linearAccumulatedSquaredGrads,
float* latentAccumulatedSquaredGrads);
}
I/O Contract
Inputs
CalculateIntermediateVariablesNative
| Name | Type | Required | Description |
|---|---|---|---|
| fieldCount | int | Yes | Number of fields (m). Determines the dimensionality of the latent interaction space. |
| latentDim | int | Yes | Dimensionality of latent vectors (d). Must be a multiple of 4 for SSE alignment. |
| count | int | Yes | Number of nonzero features in the current instance. |
| fieldIndices | int* | Yes | Array of length count. fieldIndices[i] is the field index (0 to m-1) of the i-th nonzero feature. |
| featureIndices | int* | Yes | Array of length count. featureIndices[i] is the global feature index (column index) of the i-th nonzero feature. |
| featureValues | float* | Yes | Array of length count. featureValues[i] is the value of the i-th nonzero feature. |
| linearWeights | float* | Yes | Linear weight vector, one weight per global feature index. |
| latentWeights | float* | Yes | Latent weight tensor of shape [numFeatures, m, d]. Indexed as pv[j * m * d + f * d + k] for feature j, field f, latent dimension k. |
| latentSum | float* | Yes | Scratch buffer of size m * m * d, zeroed inside the function. After execution, contains the accumulated latent sums q[f, f', k] used by the gradient computation. |
CalculateGradientAndUpdateNative
| Name | Type | Required | Description |
|---|---|---|---|
| lambdaLinear | float | Yes | L2 regularization strength for linear weights. |
| lambdaLatent | float | Yes | L2 regularization strength for latent weights. |
| learningRate | float | Yes | Base learning rate (eta) for AdaGrad. The effective rate is learningRate / sqrt(accumulatedSquaredGrad). |
| fieldCount | int | Yes | Number of fields (m). |
| latentDim | int | Yes | Latent dimension (d), must be a multiple of 4. |
| weight | float | Yes | Instance weight (for weighted loss functions). |
| count | int | Yes | Number of nonzero features in the instance. |
| fieldIndices | int* | Yes | Field indices array (same as forward pass). |
| featureIndices | int* | Yes | Feature indices array (same as forward pass). |
| featureValues | float* | Yes | Feature values array (same as forward pass). |
| latentSum | float* | Yes | Intermediate accumulator from the forward pass (must not be modified between calls). |
| slope | float | Yes | Derivative of the loss function with respect to the prediction (e.g., for logistic loss: sigma(y_hat) - y). |
| linearWeights | float* | Yes | Linear weights, updated in-place. |
| latentWeights | float* | Yes | Latent weight tensor, updated in-place. |
| linearAccumulatedSquaredGrads | float* | Yes | Per-feature sum of squared gradients for AdaGrad on linear weights. |
| latentAccumulatedSquaredGrads | float* | Yes | Per-parameter sum of squared gradients for AdaGrad on latent weights. |
Outputs
| Name | Type | Description |
|---|---|---|
| response (CalculateIntermediateVariablesNative) | float* | Scalar output: the predicted value, equal to linearResponse + latentResponse. Written to the memory location pointed to by response. |
| latentSum (CalculateIntermediateVariablesNative) | float* | Side-effect output: buffer of size m * m * d filled with accumulated sums q[f, f', k] = sum_i(v[j_i, f', k] * x_i) for features i in field f. |
| linearWeights (CalculateGradientAndUpdateNative) | float* | Updated in-place via AdaGrad: pw[j] -= (learningRate / sqrt(phw[j])) * g, where g = weight * (lambdaLinear * pw[j] + slope * x_i). |
| latentWeights (CalculateGradientAndUpdateNative) | float* | Updated in-place via AdaGrad with L2 regularization and field-aware gradient. |
| linearAccumulatedSquaredGrads (CalculateGradientAndUpdateNative) | float* | Updated in-place: phw[j] += g^2. |
| latentAccumulatedSquaredGrads (CalculateGradientAndUpdateNative) | float* | Updated in-place: phv[j, f', k] += g[k]^2. |
Usage Examples
Forward Pass: Computing FFM Prediction
// Example: 3 fields, latent dimension 4, 5 nonzero features.
int fieldCount = 3;
int latentDim = 4;
int count = 5;
int fieldIndices[] = {0, 0, 1, 2, 2};
int featureIndices[] = {3, 7, 12, 25, 30};
float featureValues[] = {1.0f, 0.5f, 1.0f, 1.0f, 0.3f};
float linearWeights[100] = { /* initialized weights */ };
float latentWeights[100 * 3 * 4] = { /* initialized latent vectors */ };
float latentSum[3 * 3 * 4]; // scratch buffer
float response;
CalculateIntermediateVariablesNative(
fieldCount, latentDim, count,
fieldIndices, featureIndices, featureValues,
linearWeights, latentWeights, latentSum, &response);
// response now contains the FFM prediction for this instance.
Backward Pass: Gradient Update
// After computing the forward pass, update weights using the gradient.
float lambdaLinear = 0.001f;
float lambdaLatent = 0.001f;
float learningRate = 0.1f;
float weight = 1.0f;
float slope = 0.3f; // derivative of loss w.r.t. prediction
float linearAccGrads[100] = {0}; // AdaGrad accumulators
float latentAccGrads[100 * 3 * 4] = {0}; // AdaGrad accumulators
CalculateGradientAndUpdateNative(
lambdaLinear, lambdaLatent, learningRate,
fieldCount, latentDim, weight, count,
fieldIndices, featureIndices, featureValues,
latentSum, slope,
linearWeights, latentWeights,
linearAccGrads, latentAccGrads);
// linearWeights and latentWeights are now updated.
Implementation Details
Prediction Formula
The FFM prediction is computed as:
y_hat = linear_term + latent_term
linear_term = sum_i( w[j_i] * x_i )
latent_term = sum_{f=1}^{m} sum_{f'=f+1}^{m} <q[f,f'], q[f',f]>
+ 0.5 * sum_{f=1}^{m} ( <q[f,f], q[f,f]> - sum_i:f_i=f( <v[j_i,f], v[j_i,f]> * x_i^2 ) )
where q[f,f'] = sum_{i:f_i=f}( v[j_i, f'] * x_i )
The q matrix (latentSum) caches the accumulated weighted latent vectors per field pair, enabling efficient gradient computation without redundant recalculation.
AdaGrad Update Rule
For each parameter, the gradient update follows the AdaGrad adaptive learning rate scheme:
Linear weight update:
g = weight * (lambdaLinear * w[j] + slope * x_i)
h[j] += g^2
w[j] -= learningRate / sqrt(h[j]) * g
Latent weight update:
g[k] = weight * (lambdaLatent * v[j,f',k] + slope * x_i * gradient_term[k])
h[j,f',k] += g[k]^2
v[j,f',k] -= learningRate * rsqrt(h[j,f',k]) * g[k]
where gradient_term depends on whether the field matches:
- If f' != f: gradient_term = q[f', f]
- If f' == f: gradient_term = q[f, f] - v[j, f] * x_i (subtract self-interaction)
SIMD Optimization
Both functions process the latent dimension d in blocks of 4 using SSE intrinsics:
// Accumulate q[f, f'] += v[j, f'] * x_i using SIMD
for (int k = 0; k + 4 <= d; k += 4)
{
const __m128 _v = _mm_load_ps(vjfprime + k);
__m128 _q = _mm_load_ps(qffprime + k);
_q = _mm_add_ps(_q, _mm_mul_ps(_v, _x));
_mm_store_ps(qffprime + k, _q);
}
The gradient update uses _mm_rsqrt_ps for fast approximate reciprocal square root instead of the more expensive _mm_sqrt_ps followed by _mm_div_ps:
// AdaGrad: v -= lr * g / sqrt(h) implemented as v -= lr * rsqrt(h) * g
_v = _mm_sub_ps(_v, _mm_mul_ps(_lr, _mm_mul_ps(_mm_rsqrt_ps(_h), _g)));
The _mm_rsqrt_ps intrinsic provides approximately 12 bits of precision, which is sufficient for the gradient update step in stochastic optimization.