Principle:Fastai Fastbook Latent Factor Model
| Knowledge Sources | |
|---|---|
| Domains | Recommender Systems, Matrix Factorization, Collaborative Filtering |
| Last Updated | 2026-02-09 17:00 GMT |
Overview
A latent factor model with dot product and bias decomposes the user-item interaction matrix into low-rank embedding matrices and additive bias terms, predicting ratings as the dot product of user and item latent vectors plus per-user and per-item biases.
Description
The fundamental insight behind latent factor models is that both users and items can be described by a small number of hidden (latent) factors. For movies, these factors might correspond to dimensions such as how much action a film contains, whether it is a classic or modern release, or the prominence of a particular genre. For users, the same factors describe preferences along these same dimensions.
The predicted rating for a user-item pair is computed as the dot product of the user's latent factor vector and the item's latent factor vector. This yields a scalar that is high when user preferences align with item characteristics and low when they diverge.
However, the pure dot product misses an important signal: some users are systematically generous raters while others are harsh, and some movies are universally loved or disliked regardless of genre alignment. Bias terms capture these systematic tendencies. A per-user bias and a per-item bias are each represented as a single scalar value added to the dot product before the final output transformation.
To constrain predictions to a valid rating range (e.g., 0 to 5), a sigmoid range function is applied to the sum. Weight decay (L2 regularization) prevents overfitting by penalizing large embedding values.
Usage
Use this approach as the primary collaborative filtering model when you have explicit ratings data (user-item-rating triples). It is the standard baseline for matrix factorization and often outperforms more complex neural approaches on pure collaborative filtering tasks. The dot product with bias architecture is sometimes called Probabilistic Matrix Factorization (PMF) in the literature.
Theoretical Basis
Matrix Factorization
Given a sparse user-item rating matrix R of shape (m x n), we seek two dense matrices:
- P of shape (m x k) -- user latent factors
- Q of shape (n x k) -- item latent factors
along with bias vectors:
- b_u of shape (m,) -- user biases
- b_i of shape (n,) -- item biases
The predicted rating is:
r_hat(u, i) = sigmoid_range( P[u] . Q[i] + b_u[u] + b_i[i], low, high )
where P[u] . Q[i] denotes the dot product of the user's latent vector and the item's latent vector, and sigmoid_range maps the unbounded sum into the interval [low, high].
Embedding as Matrix Lookup
In neural network implementations, P and Q are stored as Embedding layers. An embedding lookup for index u is mathematically equivalent to multiplying a one-hot vector e_u by the weight matrix:
P[u] = e_u^T * P (one-hot encoding multiplied by embedding matrix)
The embedding layer provides an efficient shortcut that avoids constructing the one-hot vector explicitly.
Loss Function and Regularization
The model is trained by minimizing mean squared error (MSE) over observed ratings with L2 weight decay:
Loss = (1/|B|) * sum_{(u,i,r) in B} (r - r_hat(u,i))^2 + wd * sum(params^2)
Weight decay (denoted wd) discourages overly large embedding values, preventing the model from memorizing the training set with sharp, overfitted functions. In the fastbook example, wd=0.1 yields good generalization on MovieLens 100K.
Sigmoid Range
The sigmoid range function constrains predictions to [low, high]:
sigmoid_range(x, low, high) = (high - low) * sigmoid(x) + low
Using a range slightly beyond the actual rating scale (e.g., (0, 5.5) instead of (1, 5)) makes it easier for the model to predict values near the extremes.