Implementation:Recommenders team Recommenders Surprise Utils

Knowledge Sources	Recommenders
Domains	Surprise Library, Prediction, Recommendation
Last Updated	2026-02-10 00:00 GMT

Overview

The surprise_utils module provides utility functions for integrating the Surprise recommendation library with the Recommenders toolkit, including data conversion from Surprise trainsets to DataFrames and prediction generation for both rating and ranking evaluation.

Description

This module offers three key functions. surprise_trainset_to_df converts a Surprise Trainset object back into a pandas DataFrame by mapping internal integer IDs to raw user/item IDs using the trainset's _inner2raw_id_users and _inner2raw_id_items mappings (falling back to inverting _raw2inner_id dictionaries if needed). predict generates rating predictions for specific user-item pairs by iterating over DataFrame rows with itertuples and calling the Surprise algorithm's predict method, returning a DataFrame with the predictions column renamed and unnecessary columns (details, r_ui) dropped. compute_ranking_predictions generates predictions for all user-item combinations (Cartesian product of unique users and items in the data) to support ranking metrics like NDCG. It supports an optional remove_seen flag that performs an outer merge with a dummy column to filter out user-item pairs that appeared in the training data.

Usage

Use surprise_trainset_to_df when you need to convert Surprise's internal data representation back to a standard DataFrame for analysis or evaluation. Use predict for computing rating accuracy metrics (RMSE, MAE) on specific user-item pairs. Use compute_ranking_predictions when evaluating ranking metrics (NDCG, MAP) that require predicted scores for all possible user-item combinations. Set remove_seen=True to exclude already-seen items from ranking predictions.

Code Reference

Source Location

Repository: Recommenders
File: recommenders/models/surprise/surprise_utils.py
Lines: 1-120

Signature

def surprise_trainset_to_df(
    trainset, col_user="uid", col_item="iid", col_rating="rating"
)

def predict(
    algo,
    data,
    usercol=DEFAULT_USER_COL,
    itemcol=DEFAULT_ITEM_COL,
    predcol=DEFAULT_PREDICTION_COL,
)

def compute_ranking_predictions(
    algo,
    data,
    usercol=DEFAULT_USER_COL,
    itemcol=DEFAULT_ITEM_COL,
    predcol=DEFAULT_PREDICTION_COL,
    remove_seen=False,
)

Import

from recommenders.models.surprise.surprise_utils import (
    surprise_trainset_to_df,
    predict,
    compute_ranking_predictions,
)

I/O Contract

Inputs

Name	Type	Required	Description
trainset (surprise_trainset_to_df)	surprise.Trainset	Yes	A Surprise Trainset object to convert
col_user	str	No	User column name (default "uid" for trainset conversion, DEFAULT_USER_COL for predictions)
col_item	str	No	Item column name (default "iid" for trainset conversion, DEFAULT_ITEM_COL for predictions)
col_rating	str	No	Rating column name (default "rating")
algo (predict/compute_ranking)	AlgoBase	Yes	A trained Surprise algorithm instance
data (predict/compute_ranking)	pandas.DataFrame	Yes	DataFrame containing user and item columns for prediction
predcol	str	No	Prediction column name (default DEFAULT_PREDICTION_COL)
remove_seen (compute_ranking)	bool	No	Flag to remove (user, item) pairs seen in training data (default False)

Outputs

Name	Type	Description
surprise_trainset_to_df return	pandas.DataFrame	DataFrame with user column (str), item column (str), and rating column (float) using raw IDs
predict return	pandas.DataFrame	DataFrame with usercol, itemcol, and predcol columns for the specified user-item pairs
compute_ranking_predictions return	pandas.DataFrame	DataFrame with usercol, itemcol, and predcol for all user-item combinations (optionally excluding seen pairs)

Usage Examples

Basic Usage

from recommenders.models.surprise.surprise_utils import (
    surprise_trainset_to_df,
    predict,
    compute_ranking_predictions,
)
from surprise import SVD, Dataset, Reader

# Train a Surprise model
reader = Reader(rating_scale=(1, 5))
surprise_data = Dataset.load_from_df(train_df[["userID", "itemID", "rating"]], reader)
trainset = surprise_data.build_full_trainset()
algo = SVD()
algo.fit(trainset)

# Convert trainset back to DataFrame
train_as_df = surprise_trainset_to_df(trainset)

# Generate rating predictions for test pairs
rating_predictions = predict(algo, test_df)

# Generate ranking predictions for all user-item combinations
ranking_predictions = compute_ranking_predictions(
    algo, test_df, remove_seen=True
)

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment