Implementation:Recommenders team Recommenders PySpark ALS

Knowledge Sources	Recommenders PySpark ALS API
Domains	Matrix Factorization, Recommendation Systems, Distributed Computing
Last Updated	2026-02-10 00:00 GMT

Overview

Concrete tool for training an Alternating Least Squares matrix factorization model using PySpark's distributed ML library, producing a fitted ALSModel that can generate rating predictions.

Description

This is a wrapper document for PySpark's external pyspark.ml.recommendation.ALS API, documenting how it is used within the Recommenders repository's ALS workflow. The ALS class is a Spark ML Estimator that implements distributed ALS matrix factorization. It is configured with hyperparameters (rank, regularization, iterations) and column mappings (user, item, rating columns), then fitted on a training DataFrame to produce an ALSModel. The model contains the learned user and item factor matrices and can be used to predict ratings for arbitrary user-item pairs.

In the Recommenders context, ALS is typically configured with coldStartStrategy="drop" to handle users or items that appear in the test set but not the training set. This prevents NaN predictions from corrupting downstream evaluation metrics.

Usage

Instantiate the ALS estimator after splitting the data into train/test sets. Call .fit(train_df) to train the model. The returned ALSModel is then used with .transform(test_df) to generate predictions for evaluation.

Code Reference

Source Location

Repository: External PySpark API
Package: pyspark.ml.recommendation

Signature

# Estimator configuration
als = ALS(
    rank=10,
    maxIter=15,
    regParam=0.05,
    userCol="userID",
    itemCol="itemID",
    ratingCol="rating",
    coldStartStrategy="drop",
)

# Model training
model = als.fit(train_df)  # Returns ALSModel

Import

from pyspark.ml.recommendation import ALS

External Reference

Official Documentation: pyspark.ml.recommendation.ALS
Spark ML Guide: Collaborative Filtering - Spark MLlib

I/O Contract

Inputs

Name	Type	Required	Description
rank	int	No (default: 10)	Number of latent factors (dimensionality of user and item vectors)
maxIter	int	No (default: 15)	Maximum number of ALS iterations (alternations between fixing U and V)
regParam	float	No (default: 0.05)	Regularization parameter (lambda) to prevent overfitting
userCol	str	No (default: "userID")	Name of the column containing user identifiers
itemCol	str	No (default: "itemID")	Name of the column containing item identifiers
ratingCol	str	No (default: "rating")	Name of the column containing ratings or interaction values
coldStartStrategy	str	No (default: "drop")	Strategy for handling unknown users/items at prediction time; `"drop"` removes NaN predictions, `"nan"` keeps them
implicitPrefs	bool	No (default: False)	If True, uses implicit feedback ALS with confidence weighting
alpha	float	No (default: 1.0)	Confidence scaling parameter for implicit feedback (only used when `implicitPrefs=True`)
train_df	pyspark.sql.DataFrame	Yes (for .fit())	Training DataFrame containing user, item, and rating columns

Outputs

Name	Type	Description
model	pyspark.ml.recommendation.ALSModel	Fitted model containing learned user factor matrix U and item factor matrix V; exposes `.transform()` for prediction and `.userFactors` / `.itemFactors` for accessing the latent vectors

Usage Examples

Basic ALS Training

from pyspark.ml.recommendation import ALS

# Configure ALS estimator
als = ALS(
    rank=10,
    maxIter=15,
    regParam=0.05,
    userCol="userID",
    itemCol="itemID",
    ratingCol="rating",
    coldStartStrategy="drop",
)

# Train the model
model = als.fit(train_df)

# Access learned factor matrices
print(f"User factors shape: {model.userFactors.count()} users x {model.rank} factors")
print(f"Item factors shape: {model.itemFactors.count()} items x {model.rank} factors")

ALS with Implicit Feedback

from pyspark.ml.recommendation import ALS

als = ALS(
    rank=20,
    maxIter=15,
    regParam=0.1,
    userCol="userID",
    itemCol="itemID",
    ratingCol="count",
    implicitPrefs=True,
    alpha=40.0,
    coldStartStrategy="drop",
)

model = als.fit(implicit_train_df)

Full Workflow in Recommenders Context

from recommenders.utils.spark_utils import start_or_get_spark
from recommenders.datasets.movielens import load_spark_df
from recommenders.datasets.spark_splitters import spark_random_split
from pyspark.ml.recommendation import ALS

# Setup
spark = start_or_get_spark(app_name="ALS_Example", memory="16g")
data = load_spark_df(spark, size="100k")
train, test = spark_random_split(data, ratio=0.75, seed=42)

# Train ALS model
als = ALS(
    rank=10,
    maxIter=15,
    regParam=0.05,
    userCol="userID",
    itemCol="itemID",
    ratingCol="rating",
    coldStartStrategy="drop",
)
model = als.fit(train)

Related Pages

Implements Principle

Principle:Recommenders_team_Recommenders_ALS_Matrix_Factorization

Requires Environment

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment