Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Recommenders team Recommenders Surprise Utils

From Leeroopedia


Knowledge Sources
Domains Surprise Library, Prediction, Recommendation
Last Updated 2026-02-10 00:00 GMT

Overview

The surprise_utils module provides utility functions for integrating the Surprise recommendation library with the Recommenders toolkit, including data conversion from Surprise trainsets to DataFrames and prediction generation for both rating and ranking evaluation.

Description

This module offers three key functions. surprise_trainset_to_df converts a Surprise Trainset object back into a pandas DataFrame by mapping internal integer IDs to raw user/item IDs using the trainset's _inner2raw_id_users and _inner2raw_id_items mappings (falling back to inverting _raw2inner_id dictionaries if needed). predict generates rating predictions for specific user-item pairs by iterating over DataFrame rows with itertuples and calling the Surprise algorithm's predict method, returning a DataFrame with the predictions column renamed and unnecessary columns (details, r_ui) dropped. compute_ranking_predictions generates predictions for all user-item combinations (Cartesian product of unique users and items in the data) to support ranking metrics like NDCG. It supports an optional remove_seen flag that performs an outer merge with a dummy column to filter out user-item pairs that appeared in the training data.

Usage

Use surprise_trainset_to_df when you need to convert Surprise's internal data representation back to a standard DataFrame for analysis or evaluation. Use predict for computing rating accuracy metrics (RMSE, MAE) on specific user-item pairs. Use compute_ranking_predictions when evaluating ranking metrics (NDCG, MAP) that require predicted scores for all possible user-item combinations. Set remove_seen=True to exclude already-seen items from ranking predictions.

Code Reference

Source Location

Signature

def surprise_trainset_to_df(
    trainset, col_user="uid", col_item="iid", col_rating="rating"
)

def predict(
    algo,
    data,
    usercol=DEFAULT_USER_COL,
    itemcol=DEFAULT_ITEM_COL,
    predcol=DEFAULT_PREDICTION_COL,
)

def compute_ranking_predictions(
    algo,
    data,
    usercol=DEFAULT_USER_COL,
    itemcol=DEFAULT_ITEM_COL,
    predcol=DEFAULT_PREDICTION_COL,
    remove_seen=False,
)

Import

from recommenders.models.surprise.surprise_utils import (
    surprise_trainset_to_df,
    predict,
    compute_ranking_predictions,
)

I/O Contract

Inputs

Name Type Required Description
trainset (surprise_trainset_to_df) surprise.Trainset Yes A Surprise Trainset object to convert
col_user str No User column name (default "uid" for trainset conversion, DEFAULT_USER_COL for predictions)
col_item str No Item column name (default "iid" for trainset conversion, DEFAULT_ITEM_COL for predictions)
col_rating str No Rating column name (default "rating")
algo (predict/compute_ranking) AlgoBase Yes A trained Surprise algorithm instance
data (predict/compute_ranking) pandas.DataFrame Yes DataFrame containing user and item columns for prediction
predcol str No Prediction column name (default DEFAULT_PREDICTION_COL)
remove_seen (compute_ranking) bool No Flag to remove (user, item) pairs seen in training data (default False)

Outputs

Name Type Description
surprise_trainset_to_df return pandas.DataFrame DataFrame with user column (str), item column (str), and rating column (float) using raw IDs
predict return pandas.DataFrame DataFrame with usercol, itemcol, and predcol columns for the specified user-item pairs
compute_ranking_predictions return pandas.DataFrame DataFrame with usercol, itemcol, and predcol for all user-item combinations (optionally excluding seen pairs)

Usage Examples

Basic Usage

from recommenders.models.surprise.surprise_utils import (
    surprise_trainset_to_df,
    predict,
    compute_ranking_predictions,
)
from surprise import SVD, Dataset, Reader

# Train a Surprise model
reader = Reader(rating_scale=(1, 5))
surprise_data = Dataset.load_from_df(train_df[["userID", "itemID", "rating"]], reader)
trainset = surprise_data.build_full_trainset()
algo = SVD()
algo.fit(trainset)

# Convert trainset back to DataFrame
train_as_df = surprise_trainset_to_df(trainset)

# Generate rating predictions for test pairs
rating_predictions = predict(algo, test_df)

# Generate ranking predictions for all user-item combinations
ranking_predictions = compute_ranking_predictions(
    algo, test_df, remove_seen=True
)

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment