Implementation:Dotnet Machinelearning FactorizationMachineCore

Knowledge Sources	Dotnet_Machinelearning
Domains	Recommender Systems, Factorization Machines, Native Interop
Last Updated	2026-02-09 12:00 GMT

Overview

Native C++ implementation of Field-Aware Factorization Machine (FFM) forward computation and AdaGrad gradient update, using SSE intrinsics for SIMD-accelerated latent factor interactions.

Description

FactorizationMachineCore.cpp implements the two computational kernels of the Field-Aware Factorization Machine (FFM) model in ML.NET. FFMs extend standard factorization machines by learning separate latent vectors for each (feature, field) pair rather than a single latent vector per feature. This native library provides SIMD-optimized routines that are called from the managed C# trainer via P/Invoke.

The file contains two exported functions:

CalculateIntermediateVariablesNative computes the model's prediction for a single instance. It calculates the linear term (inner product of linear weights and feature values) and the latent interaction term (sum of all pairwise field-aware interactions). The function uses an intermediate buffer latentSum (of size fieldCount x fieldCount x latentDim) to accumulate partial sums that are reused during gradient computation.

CalculateGradientAndUpdateNative computes stochastic gradients and applies AdaGrad updates to both the linear weights and the latent weight matrices. For each feature in the instance, it updates the linear weight via standard AdaGrad and then updates all latent vectors across all field projections. The gradient includes both the loss-function gradient and L2 regularization. AdaGrad's per-parameter adaptive learning rate is applied using _mm_rsqrt_ps for fast reciprocal square root.

Both functions operate on sparse feature representations where features are identified by (fieldIndex, featureIndex, featureValue) triples, making them efficient for the high-dimensional sparse data typical of click-through-rate prediction and recommendation tasks.

Usage

These functions are invoked during FFM model training in ML.NET. CalculateIntermediateVariablesNative is called during the forward pass to compute the prediction and cache intermediate values. CalculateGradientAndUpdateNative is called during the backward pass to update model parameters. The managed wrapper in FieldAwareFactorizationMachineInterface.cs handles alignment, memory pinning, and dispatching to these native implementations on .NET Standard targets.

Code Reference

Source Location

Repository: Dotnet_Machinelearning
File: src/Native/CpuMathNative/FactorizationMachineCore.cpp
Lines: 1-179

Signature

EXPORT_API(void) CalculateIntermediateVariablesNative(
    int fieldCount,           // m: number of fields
    int latentDim,            // d: dimensionality of latent vectors (must be multiple of 4)
    int count,                // c: number of nonzero features in this instance
    int * fieldIndices,       // pf[c]: field index for each feature
    int * featureIndices,     // pi[c]: global feature index for each feature
    float * featureValues,    // px[c]: value of each feature
    float * linearWeights,    // pw[numFeatures]: linear weight per feature
    float * latentWeights,    // pv[numFeatures * m * d]: latent weight tensor
    float * latentSum,        // pq[m * m * d]: intermediate accumulator (zeroed inside)
    float * response          // output: scalar prediction value
);

EXPORT_API(void) CalculateGradientAndUpdateNative(
    float lambdaLinear,       // L2 regularization coefficient for linear weights
    float lambdaLatent,       // L2 regularization coefficient for latent weights
    float learningRate,       // base learning rate for AdaGrad
    int fieldCount,           // m: number of fields
    int latentDim,            // d: latent dimension (must be multiple of 4)
    float weight,             // instance weight for weighted loss
    int count,                // c: number of nonzero features in this instance
    int * fieldIndices,       // pf[c]: field index for each feature
    int * featureIndices,     // pi[c]: global feature index for each feature
    float * featureValues,    // px[c]: value of each feature
    float * latentSum,        // pq[m * m * d]: intermediate values from forward pass
    float slope,              // derivative of loss function w.r.t. prediction
    float * linearWeights,    // pw[numFeatures]: updated in-place
    float * latentWeights,    // pv[numFeatures * m * d]: updated in-place
    float * linearAccumulatedSquaredGrads,   // phw[numFeatures]: AdaGrad accumulators
    float * latentAccumulatedSquaredGrads    // phv[numFeatures * m * d]: AdaGrad accumulators
);

Import

// P/Invoke declarations from
// src/Microsoft.ML.CpuMath/FactorizationMachine/FactorizationMachineInterface.cs
using System.Runtime.InteropServices;
using System.Security;

internal static unsafe partial class FieldAwareFactorizationMachineInterface
{
    private const string NativePath = "CpuMathNative";

    [DllImport(NativePath), SuppressUnmanagedCodeSecurity]
    private static extern void CalculateIntermediateVariablesNative(
        int fieldCount, int latentDim, int count,
        int* fieldIndices, int* featureIndices, float* featureValues,
        float* linearWeights, float* latentWeights,
        float* latentSum, float* response);

    [DllImport(NativePath), SuppressUnmanagedCodeSecurity]
    private static extern void CalculateGradientAndUpdateNative(
        float lambdaLinear, float lambdaLatent, float learningRate,
        int fieldCount, int latentDim, float weight, int count,
        int* fieldIndices, int* featureIndices, float* featureValues,
        float* latentSum, float slope,
        float* linearWeights, float* latentWeights,
        float* linearAccumulatedSquaredGrads,
        float* latentAccumulatedSquaredGrads);
}

I/O Contract

Inputs

CalculateIntermediateVariablesNative

Name	Type	Required	Description
fieldCount	int	Yes	Number of fields (m). Determines the dimensionality of the latent interaction space.
latentDim	int	Yes	Dimensionality of latent vectors (d). Must be a multiple of 4 for SSE alignment.
count	int	Yes	Number of nonzero features in the current instance.
fieldIndices	int*	Yes	Array of length count. fieldIndices[i] is the field index (0 to m-1) of the i-th nonzero feature.
featureIndices	int*	Yes	Array of length count. featureIndices[i] is the global feature index (column index) of the i-th nonzero feature.
featureValues	float*	Yes	Array of length count. featureValues[i] is the value of the i-th nonzero feature.
linearWeights	float*	Yes	Linear weight vector, one weight per global feature index.
latentWeights	float*	Yes	Latent weight tensor of shape [numFeatures, m, d]. Indexed as pv[j * m * d + f * d + k] for feature j, field f, latent dimension k.
latentSum	float*	Yes	Scratch buffer of size m * m * d, zeroed inside the function. After execution, contains the accumulated latent sums q[f, f', k] used by the gradient computation.

CalculateGradientAndUpdateNative

Name	Type	Required	Description
lambdaLinear	float	Yes	L2 regularization strength for linear weights.
lambdaLatent	float	Yes	L2 regularization strength for latent weights.
learningRate	float	Yes	Base learning rate (eta) for AdaGrad. The effective rate is learningRate / sqrt(accumulatedSquaredGrad).
fieldCount	int	Yes	Number of fields (m).
latentDim	int	Yes	Latent dimension (d), must be a multiple of 4.
weight	float	Yes	Instance weight (for weighted loss functions).
count	int	Yes	Number of nonzero features in the instance.
fieldIndices	int*	Yes	Field indices array (same as forward pass).
featureIndices	int*	Yes	Feature indices array (same as forward pass).
featureValues	float*	Yes	Feature values array (same as forward pass).
latentSum	float*	Yes	Intermediate accumulator from the forward pass (must not be modified between calls).
slope	float	Yes	Derivative of the loss function with respect to the prediction (e.g., for logistic loss: sigma(y_hat) - y).
linearWeights	float*	Yes	Linear weights, updated in-place.
latentWeights	float*	Yes	Latent weight tensor, updated in-place.
linearAccumulatedSquaredGrads	float*	Yes	Per-feature sum of squared gradients for AdaGrad on linear weights.
latentAccumulatedSquaredGrads	float*	Yes	Per-parameter sum of squared gradients for AdaGrad on latent weights.

Outputs

Name	Type	Description
response (CalculateIntermediateVariablesNative)	float*	Scalar output: the predicted value, equal to linearResponse + latentResponse. Written to the memory location pointed to by response.
latentSum (CalculateIntermediateVariablesNative)	float*	Side-effect output: buffer of size m * m * d filled with accumulated sums q[f, f', k] = sum_i(v[j_i, f', k] * x_i) for features i in field f.
linearWeights (CalculateGradientAndUpdateNative)	float*	Updated in-place via AdaGrad: pw[j] -= (learningRate / sqrt(phw[j])) * g, where g = weight * (lambdaLinear * pw[j] + slope * x_i).
latentWeights (CalculateGradientAndUpdateNative)	float*	Updated in-place via AdaGrad with L2 regularization and field-aware gradient.
linearAccumulatedSquaredGrads (CalculateGradientAndUpdateNative)	float*	Updated in-place: phw[j] += g^2.
latentAccumulatedSquaredGrads (CalculateGradientAndUpdateNative)	float*	Updated in-place: phv[j, f', k] += g[k]^2.

Usage Examples

Forward Pass: Computing FFM Prediction

// Example: 3 fields, latent dimension 4, 5 nonzero features.
int fieldCount = 3;
int latentDim = 4;
int count = 5;

int fieldIndices[]   = {0, 0, 1, 2, 2};
int featureIndices[] = {3, 7, 12, 25, 30};
float featureValues[] = {1.0f, 0.5f, 1.0f, 1.0f, 0.3f};

float linearWeights[100] = { /* initialized weights */ };
float latentWeights[100 * 3 * 4] = { /* initialized latent vectors */ };
float latentSum[3 * 3 * 4];  // scratch buffer
float response;

CalculateIntermediateVariablesNative(
    fieldCount, latentDim, count,
    fieldIndices, featureIndices, featureValues,
    linearWeights, latentWeights, latentSum, &response);
// response now contains the FFM prediction for this instance.

Backward Pass: Gradient Update

// After computing the forward pass, update weights using the gradient.
float lambdaLinear = 0.001f;
float lambdaLatent = 0.001f;
float learningRate = 0.1f;
float weight = 1.0f;
float slope = 0.3f;  // derivative of loss w.r.t. prediction

float linearAccGrads[100] = {0};         // AdaGrad accumulators
float latentAccGrads[100 * 3 * 4] = {0}; // AdaGrad accumulators

CalculateGradientAndUpdateNative(
    lambdaLinear, lambdaLatent, learningRate,
    fieldCount, latentDim, weight, count,
    fieldIndices, featureIndices, featureValues,
    latentSum, slope,
    linearWeights, latentWeights,
    linearAccGrads, latentAccGrads);
// linearWeights and latentWeights are now updated.

Implementation Details

Prediction Formula

The FFM prediction is computed as:

y_hat = linear_term + latent_term

linear_term = sum_i( w[j_i] * x_i )

latent_term = sum_{f=1}^{m} sum_{f'=f+1}^{m} <q[f,f'], q[f',f]>
            + 0.5 * sum_{f=1}^{m} ( <q[f,f], q[f,f]> - sum_i:f_i=f( <v[j_i,f], v[j_i,f]> * x_i^2 ) )

where q[f,f'] = sum_{i:f_i=f}( v[j_i, f'] * x_i )

The q matrix (latentSum) caches the accumulated weighted latent vectors per field pair, enabling efficient gradient computation without redundant recalculation.

AdaGrad Update Rule

For each parameter, the gradient update follows the AdaGrad adaptive learning rate scheme:

Linear weight update:
    g = weight * (lambdaLinear * w[j] + slope * x_i)
    h[j] += g^2
    w[j] -= learningRate / sqrt(h[j]) * g

Latent weight update:
    g[k] = weight * (lambdaLatent * v[j,f',k] + slope * x_i * gradient_term[k])
    h[j,f',k] += g[k]^2
    v[j,f',k] -= learningRate * rsqrt(h[j,f',k]) * g[k]

where gradient_term depends on whether the field matches:

If f' != f: gradient_term = q[f', f]
If f' == f: gradient_term = q[f, f] - v[j, f] * x_i (subtract self-interaction)

SIMD Optimization

Both functions process the latent dimension d in blocks of 4 using SSE intrinsics:

// Accumulate q[f, f'] += v[j, f'] * x_i using SIMD
for (int k = 0; k + 4 <= d; k += 4)
{
    const __m128 _v = _mm_load_ps(vjfprime + k);
    __m128 _q = _mm_load_ps(qffprime + k);
    _q = _mm_add_ps(_q, _mm_mul_ps(_v, _x));
    _mm_store_ps(qffprime + k, _q);
}

The gradient update uses _mm_rsqrt_ps for fast approximate reciprocal square root instead of the more expensive _mm_sqrt_ps followed by _mm_div_ps:

// AdaGrad: v -= lr * g / sqrt(h)  implemented as  v -= lr * rsqrt(h) * g
_v = _mm_sub_ps(_v, _mm_mul_ps(_lr, _mm_mul_ps(_mm_rsqrt_ps(_h), _g)));

The _mm_rsqrt_ps intrinsic provides approximately 12 bits of precision, which is sufficient for the gradient update step in stochastic optimization.

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment