Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:Dotnet Machinelearning FactorizationMachineCore

From Leeroopedia


Knowledge Sources
Domains Recommender Systems, Factorization Machines, Native Interop
Last Updated 2026-02-09 12:00 GMT

Overview

Native C++ implementation of Field-Aware Factorization Machine (FFM) forward computation and AdaGrad gradient update, using SSE intrinsics for SIMD-accelerated latent factor interactions.

Description

FactorizationMachineCore.cpp implements the two computational kernels of the Field-Aware Factorization Machine (FFM) model in ML.NET. FFMs extend standard factorization machines by learning separate latent vectors for each (feature, field) pair rather than a single latent vector per feature. This native library provides SIMD-optimized routines that are called from the managed C# trainer via P/Invoke.

The file contains two exported functions:

  • CalculateIntermediateVariablesNative computes the model's prediction for a single instance. It calculates the linear term (inner product of linear weights and feature values) and the latent interaction term (sum of all pairwise field-aware interactions). The function uses an intermediate buffer latentSum (of size fieldCount x fieldCount x latentDim) to accumulate partial sums that are reused during gradient computation.
  • CalculateGradientAndUpdateNative computes stochastic gradients and applies AdaGrad updates to both the linear weights and the latent weight matrices. For each feature in the instance, it updates the linear weight via standard AdaGrad and then updates all latent vectors across all field projections. The gradient includes both the loss-function gradient and L2 regularization. AdaGrad's per-parameter adaptive learning rate is applied using _mm_rsqrt_ps for fast reciprocal square root.

Both functions operate on sparse feature representations where features are identified by (fieldIndex, featureIndex, featureValue) triples, making them efficient for the high-dimensional sparse data typical of click-through-rate prediction and recommendation tasks.

Usage

These functions are invoked during FFM model training in ML.NET. CalculateIntermediateVariablesNative is called during the forward pass to compute the prediction and cache intermediate values. CalculateGradientAndUpdateNative is called during the backward pass to update model parameters. The managed wrapper in FieldAwareFactorizationMachineInterface.cs handles alignment, memory pinning, and dispatching to these native implementations on .NET Standard targets.

Code Reference

Source Location

Signature

EXPORT_API(void) CalculateIntermediateVariablesNative(
    int fieldCount,           // m: number of fields
    int latentDim,            // d: dimensionality of latent vectors (must be multiple of 4)
    int count,                // c: number of nonzero features in this instance
    int * fieldIndices,       // pf[c]: field index for each feature
    int * featureIndices,     // pi[c]: global feature index for each feature
    float * featureValues,    // px[c]: value of each feature
    float * linearWeights,    // pw[numFeatures]: linear weight per feature
    float * latentWeights,    // pv[numFeatures * m * d]: latent weight tensor
    float * latentSum,        // pq[m * m * d]: intermediate accumulator (zeroed inside)
    float * response          // output: scalar prediction value
);

EXPORT_API(void) CalculateGradientAndUpdateNative(
    float lambdaLinear,       // L2 regularization coefficient for linear weights
    float lambdaLatent,       // L2 regularization coefficient for latent weights
    float learningRate,       // base learning rate for AdaGrad
    int fieldCount,           // m: number of fields
    int latentDim,            // d: latent dimension (must be multiple of 4)
    float weight,             // instance weight for weighted loss
    int count,                // c: number of nonzero features in this instance
    int * fieldIndices,       // pf[c]: field index for each feature
    int * featureIndices,     // pi[c]: global feature index for each feature
    float * featureValues,    // px[c]: value of each feature
    float * latentSum,        // pq[m * m * d]: intermediate values from forward pass
    float slope,              // derivative of loss function w.r.t. prediction
    float * linearWeights,    // pw[numFeatures]: updated in-place
    float * latentWeights,    // pv[numFeatures * m * d]: updated in-place
    float * linearAccumulatedSquaredGrads,   // phw[numFeatures]: AdaGrad accumulators
    float * latentAccumulatedSquaredGrads    // phv[numFeatures * m * d]: AdaGrad accumulators
);

Import

// P/Invoke declarations from
// src/Microsoft.ML.CpuMath/FactorizationMachine/FactorizationMachineInterface.cs
using System.Runtime.InteropServices;
using System.Security;

internal static unsafe partial class FieldAwareFactorizationMachineInterface
{
    private const string NativePath = "CpuMathNative";

    [DllImport(NativePath), SuppressUnmanagedCodeSecurity]
    private static extern void CalculateIntermediateVariablesNative(
        int fieldCount, int latentDim, int count,
        int* fieldIndices, int* featureIndices, float* featureValues,
        float* linearWeights, float* latentWeights,
        float* latentSum, float* response);

    [DllImport(NativePath), SuppressUnmanagedCodeSecurity]
    private static extern void CalculateGradientAndUpdateNative(
        float lambdaLinear, float lambdaLatent, float learningRate,
        int fieldCount, int latentDim, float weight, int count,
        int* fieldIndices, int* featureIndices, float* featureValues,
        float* latentSum, float slope,
        float* linearWeights, float* latentWeights,
        float* linearAccumulatedSquaredGrads,
        float* latentAccumulatedSquaredGrads);
}

I/O Contract

Inputs

CalculateIntermediateVariablesNative

Name Type Required Description
fieldCount int Yes Number of fields (m). Determines the dimensionality of the latent interaction space.
latentDim int Yes Dimensionality of latent vectors (d). Must be a multiple of 4 for SSE alignment.
count int Yes Number of nonzero features in the current instance.
fieldIndices int* Yes Array of length count. fieldIndices[i] is the field index (0 to m-1) of the i-th nonzero feature.
featureIndices int* Yes Array of length count. featureIndices[i] is the global feature index (column index) of the i-th nonzero feature.
featureValues float* Yes Array of length count. featureValues[i] is the value of the i-th nonzero feature.
linearWeights float* Yes Linear weight vector, one weight per global feature index.
latentWeights float* Yes Latent weight tensor of shape [numFeatures, m, d]. Indexed as pv[j * m * d + f * d + k] for feature j, field f, latent dimension k.
latentSum float* Yes Scratch buffer of size m * m * d, zeroed inside the function. After execution, contains the accumulated latent sums q[f, f', k] used by the gradient computation.

CalculateGradientAndUpdateNative

Name Type Required Description
lambdaLinear float Yes L2 regularization strength for linear weights.
lambdaLatent float Yes L2 regularization strength for latent weights.
learningRate float Yes Base learning rate (eta) for AdaGrad. The effective rate is learningRate / sqrt(accumulatedSquaredGrad).
fieldCount int Yes Number of fields (m).
latentDim int Yes Latent dimension (d), must be a multiple of 4.
weight float Yes Instance weight (for weighted loss functions).
count int Yes Number of nonzero features in the instance.
fieldIndices int* Yes Field indices array (same as forward pass).
featureIndices int* Yes Feature indices array (same as forward pass).
featureValues float* Yes Feature values array (same as forward pass).
latentSum float* Yes Intermediate accumulator from the forward pass (must not be modified between calls).
slope float Yes Derivative of the loss function with respect to the prediction (e.g., for logistic loss: sigma(y_hat) - y).
linearWeights float* Yes Linear weights, updated in-place.
latentWeights float* Yes Latent weight tensor, updated in-place.
linearAccumulatedSquaredGrads float* Yes Per-feature sum of squared gradients for AdaGrad on linear weights.
latentAccumulatedSquaredGrads float* Yes Per-parameter sum of squared gradients for AdaGrad on latent weights.

Outputs

Name Type Description
response (CalculateIntermediateVariablesNative) float* Scalar output: the predicted value, equal to linearResponse + latentResponse. Written to the memory location pointed to by response.
latentSum (CalculateIntermediateVariablesNative) float* Side-effect output: buffer of size m * m * d filled with accumulated sums q[f, f', k] = sum_i(v[j_i, f', k] * x_i) for features i in field f.
linearWeights (CalculateGradientAndUpdateNative) float* Updated in-place via AdaGrad: pw[j] -= (learningRate / sqrt(phw[j])) * g, where g = weight * (lambdaLinear * pw[j] + slope * x_i).
latentWeights (CalculateGradientAndUpdateNative) float* Updated in-place via AdaGrad with L2 regularization and field-aware gradient.
linearAccumulatedSquaredGrads (CalculateGradientAndUpdateNative) float* Updated in-place: phw[j] += g^2.
latentAccumulatedSquaredGrads (CalculateGradientAndUpdateNative) float* Updated in-place: phv[j, f', k] += g[k]^2.

Usage Examples

Forward Pass: Computing FFM Prediction

// Example: 3 fields, latent dimension 4, 5 nonzero features.
int fieldCount = 3;
int latentDim = 4;
int count = 5;

int fieldIndices[]   = {0, 0, 1, 2, 2};
int featureIndices[] = {3, 7, 12, 25, 30};
float featureValues[] = {1.0f, 0.5f, 1.0f, 1.0f, 0.3f};

float linearWeights[100] = { /* initialized weights */ };
float latentWeights[100 * 3 * 4] = { /* initialized latent vectors */ };
float latentSum[3 * 3 * 4];  // scratch buffer
float response;

CalculateIntermediateVariablesNative(
    fieldCount, latentDim, count,
    fieldIndices, featureIndices, featureValues,
    linearWeights, latentWeights, latentSum, &response);
// response now contains the FFM prediction for this instance.

Backward Pass: Gradient Update

// After computing the forward pass, update weights using the gradient.
float lambdaLinear = 0.001f;
float lambdaLatent = 0.001f;
float learningRate = 0.1f;
float weight = 1.0f;
float slope = 0.3f;  // derivative of loss w.r.t. prediction

float linearAccGrads[100] = {0};         // AdaGrad accumulators
float latentAccGrads[100 * 3 * 4] = {0}; // AdaGrad accumulators

CalculateGradientAndUpdateNative(
    lambdaLinear, lambdaLatent, learningRate,
    fieldCount, latentDim, weight, count,
    fieldIndices, featureIndices, featureValues,
    latentSum, slope,
    linearWeights, latentWeights,
    linearAccGrads, latentAccGrads);
// linearWeights and latentWeights are now updated.

Implementation Details

Prediction Formula

The FFM prediction is computed as:

y_hat = linear_term + latent_term

linear_term = sum_i( w[j_i] * x_i )

latent_term = sum_{f=1}^{m} sum_{f'=f+1}^{m} <q[f,f'], q[f',f]>
            + 0.5 * sum_{f=1}^{m} ( <q[f,f], q[f,f]> - sum_i:f_i=f( <v[j_i,f], v[j_i,f]> * x_i^2 ) )

where q[f,f'] = sum_{i:f_i=f}( v[j_i, f'] * x_i )

The q matrix (latentSum) caches the accumulated weighted latent vectors per field pair, enabling efficient gradient computation without redundant recalculation.

AdaGrad Update Rule

For each parameter, the gradient update follows the AdaGrad adaptive learning rate scheme:

Linear weight update:
    g = weight * (lambdaLinear * w[j] + slope * x_i)
    h[j] += g^2
    w[j] -= learningRate / sqrt(h[j]) * g

Latent weight update:
    g[k] = weight * (lambdaLatent * v[j,f',k] + slope * x_i * gradient_term[k])
    h[j,f',k] += g[k]^2
    v[j,f',k] -= learningRate * rsqrt(h[j,f',k]) * g[k]

where gradient_term depends on whether the field matches:

  • If f' != f: gradient_term = q[f', f]
  • If f' == f: gradient_term = q[f, f] - v[j, f] * x_i (subtract self-interaction)

SIMD Optimization

Both functions process the latent dimension d in blocks of 4 using SSE intrinsics:

// Accumulate q[f, f'] += v[j, f'] * x_i using SIMD
for (int k = 0; k + 4 <= d; k += 4)
{
    const __m128 _v = _mm_load_ps(vjfprime + k);
    __m128 _q = _mm_load_ps(qffprime + k);
    _q = _mm_add_ps(_q, _mm_mul_ps(_v, _x));
    _mm_store_ps(qffprime + k, _q);
}

The gradient update uses _mm_rsqrt_ps for fast approximate reciprocal square root instead of the more expensive _mm_sqrt_ps followed by _mm_div_ps:

// AdaGrad: v -= lr * g / sqrt(h)  implemented as  v -= lr * rsqrt(h) * g
_v = _mm_sub_ps(_v, _mm_mul_ps(_lr, _mm_mul_ps(_mm_rsqrt_ps(_h), _g)));

The _mm_rsqrt_ps intrinsic provides approximately 12 bits of precision, which is sufficient for the gradient update step in stochastic optimization.

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment