Implementation:Recommenders team Recommenders Surprise Utils
| Knowledge Sources | |
|---|---|
| Domains | Surprise Library, Prediction, Recommendation |
| Last Updated | 2026-02-10 00:00 GMT |
Overview
The surprise_utils module provides utility functions for integrating the Surprise recommendation library with the Recommenders toolkit, including data conversion from Surprise trainsets to DataFrames and prediction generation for both rating and ranking evaluation.
Description
This module offers three key functions. surprise_trainset_to_df converts a Surprise Trainset object back into a pandas DataFrame by mapping internal integer IDs to raw user/item IDs using the trainset's _inner2raw_id_users and _inner2raw_id_items mappings (falling back to inverting _raw2inner_id dictionaries if needed). predict generates rating predictions for specific user-item pairs by iterating over DataFrame rows with itertuples and calling the Surprise algorithm's predict method, returning a DataFrame with the predictions column renamed and unnecessary columns (details, r_ui) dropped. compute_ranking_predictions generates predictions for all user-item combinations (Cartesian product of unique users and items in the data) to support ranking metrics like NDCG. It supports an optional remove_seen flag that performs an outer merge with a dummy column to filter out user-item pairs that appeared in the training data.
Usage
Use surprise_trainset_to_df when you need to convert Surprise's internal data representation back to a standard DataFrame for analysis or evaluation. Use predict for computing rating accuracy metrics (RMSE, MAE) on specific user-item pairs. Use compute_ranking_predictions when evaluating ranking metrics (NDCG, MAP) that require predicted scores for all possible user-item combinations. Set remove_seen=True to exclude already-seen items from ranking predictions.
Code Reference
Source Location
- Repository: Recommenders
- File: recommenders/models/surprise/surprise_utils.py
- Lines: 1-120
Signature
def surprise_trainset_to_df(
trainset, col_user="uid", col_item="iid", col_rating="rating"
)
def predict(
algo,
data,
usercol=DEFAULT_USER_COL,
itemcol=DEFAULT_ITEM_COL,
predcol=DEFAULT_PREDICTION_COL,
)
def compute_ranking_predictions(
algo,
data,
usercol=DEFAULT_USER_COL,
itemcol=DEFAULT_ITEM_COL,
predcol=DEFAULT_PREDICTION_COL,
remove_seen=False,
)
Import
from recommenders.models.surprise.surprise_utils import (
surprise_trainset_to_df,
predict,
compute_ranking_predictions,
)
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| trainset (surprise_trainset_to_df) | surprise.Trainset | Yes | A Surprise Trainset object to convert |
| col_user | str | No | User column name (default "uid" for trainset conversion, DEFAULT_USER_COL for predictions) |
| col_item | str | No | Item column name (default "iid" for trainset conversion, DEFAULT_ITEM_COL for predictions) |
| col_rating | str | No | Rating column name (default "rating") |
| algo (predict/compute_ranking) | AlgoBase | Yes | A trained Surprise algorithm instance |
| data (predict/compute_ranking) | pandas.DataFrame | Yes | DataFrame containing user and item columns for prediction |
| predcol | str | No | Prediction column name (default DEFAULT_PREDICTION_COL) |
| remove_seen (compute_ranking) | bool | No | Flag to remove (user, item) pairs seen in training data (default False) |
Outputs
| Name | Type | Description |
|---|---|---|
| surprise_trainset_to_df return | pandas.DataFrame | DataFrame with user column (str), item column (str), and rating column (float) using raw IDs |
| predict return | pandas.DataFrame | DataFrame with usercol, itemcol, and predcol columns for the specified user-item pairs |
| compute_ranking_predictions return | pandas.DataFrame | DataFrame with usercol, itemcol, and predcol for all user-item combinations (optionally excluding seen pairs) |
Usage Examples
Basic Usage
from recommenders.models.surprise.surprise_utils import (
surprise_trainset_to_df,
predict,
compute_ranking_predictions,
)
from surprise import SVD, Dataset, Reader
# Train a Surprise model
reader = Reader(rating_scale=(1, 5))
surprise_data = Dataset.load_from_df(train_df[["userID", "itemID", "rating"]], reader)
trainset = surprise_data.build_full_trainset()
algo = SVD()
algo.fit(trainset)
# Convert trainset back to DataFrame
train_as_df = surprise_trainset_to_df(trainset)
# Generate rating predictions for test pairs
rating_predictions = predict(algo, test_df)
# Generate ranking predictions for all user-item combinations
ranking_predictions = compute_ranking_predictions(
algo, test_df, remove_seen=True
)