Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:Recommenders team Recommenders PySpark ALS

From Leeroopedia


Knowledge Sources
Domains Matrix Factorization, Recommendation Systems, Distributed Computing
Last Updated 2026-02-10 00:00 GMT

Overview

Concrete tool for training an Alternating Least Squares matrix factorization model using PySpark's distributed ML library, producing a fitted ALSModel that can generate rating predictions.

Description

This is a wrapper document for PySpark's external pyspark.ml.recommendation.ALS API, documenting how it is used within the Recommenders repository's ALS workflow. The ALS class is a Spark ML Estimator that implements distributed ALS matrix factorization. It is configured with hyperparameters (rank, regularization, iterations) and column mappings (user, item, rating columns), then fitted on a training DataFrame to produce an ALSModel. The model contains the learned user and item factor matrices and can be used to predict ratings for arbitrary user-item pairs.

In the Recommenders context, ALS is typically configured with coldStartStrategy="drop" to handle users or items that appear in the test set but not the training set. This prevents NaN predictions from corrupting downstream evaluation metrics.

Usage

Instantiate the ALS estimator after splitting the data into train/test sets. Call .fit(train_df) to train the model. The returned ALSModel is then used with .transform(test_df) to generate predictions for evaluation.

Code Reference

Source Location

  • Repository: External PySpark API
  • Package: pyspark.ml.recommendation

Signature

# Estimator configuration
als = ALS(
    rank=10,
    maxIter=15,
    regParam=0.05,
    userCol="userID",
    itemCol="itemID",
    ratingCol="rating",
    coldStartStrategy="drop",
)

# Model training
model = als.fit(train_df)  # Returns ALSModel

Import

from pyspark.ml.recommendation import ALS

External Reference

I/O Contract

Inputs

Name Type Required Description
rank int No (default: 10) Number of latent factors (dimensionality of user and item vectors)
maxIter int No (default: 15) Maximum number of ALS iterations (alternations between fixing U and V)
regParam float No (default: 0.05) Regularization parameter (lambda) to prevent overfitting
userCol str No (default: "userID") Name of the column containing user identifiers
itemCol str No (default: "itemID") Name of the column containing item identifiers
ratingCol str No (default: "rating") Name of the column containing ratings or interaction values
coldStartStrategy str No (default: "drop") Strategy for handling unknown users/items at prediction time; "drop" removes NaN predictions, "nan" keeps them
implicitPrefs bool No (default: False) If True, uses implicit feedback ALS with confidence weighting
alpha float No (default: 1.0) Confidence scaling parameter for implicit feedback (only used when implicitPrefs=True)
train_df pyspark.sql.DataFrame Yes (for .fit()) Training DataFrame containing user, item, and rating columns

Outputs

Name Type Description
model pyspark.ml.recommendation.ALSModel Fitted model containing learned user factor matrix U and item factor matrix V; exposes .transform() for prediction and .userFactors / .itemFactors for accessing the latent vectors

Usage Examples

Basic ALS Training

from pyspark.ml.recommendation import ALS

# Configure ALS estimator
als = ALS(
    rank=10,
    maxIter=15,
    regParam=0.05,
    userCol="userID",
    itemCol="itemID",
    ratingCol="rating",
    coldStartStrategy="drop",
)

# Train the model
model = als.fit(train_df)

# Access learned factor matrices
print(f"User factors shape: {model.userFactors.count()} users x {model.rank} factors")
print(f"Item factors shape: {model.itemFactors.count()} items x {model.rank} factors")

ALS with Implicit Feedback

from pyspark.ml.recommendation import ALS

als = ALS(
    rank=20,
    maxIter=15,
    regParam=0.1,
    userCol="userID",
    itemCol="itemID",
    ratingCol="count",
    implicitPrefs=True,
    alpha=40.0,
    coldStartStrategy="drop",
)

model = als.fit(implicit_train_df)

Full Workflow in Recommenders Context

from recommenders.utils.spark_utils import start_or_get_spark
from recommenders.datasets.movielens import load_spark_df
from recommenders.datasets.spark_splitters import spark_random_split
from pyspark.ml.recommendation import ALS

# Setup
spark = start_or_get_spark(app_name="ALS_Example", memory="16g")
data = load_spark_df(spark, size="100k")
train, test = spark_random_split(data, ratio=0.75, seed=42)

# Train ALS model
als = ALS(
    rank=10,
    maxIter=15,
    regParam=0.05,
    userCol="userID",
    itemCol="itemID",
    ratingCol="rating",
    coldStartStrategy="drop",
)
model = als.fit(train)

Related Pages

Implements Principle

Requires Environment

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment