Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Recommenders team Recommenders RLRMC Dataset

From Leeroopedia


Knowledge Sources
Domains Matrix Completion, Data Preprocessing, Sparse Matrix
Last Updated 2026-02-10 00:00 GMT

Overview

The RLRMCdataset class prepares and manages sparse data structures required by the RLRMC (Riemannian Low-Rank Matrix Completion) algorithm, converting pandas DataFrames to CSR sparse matrices with user/item reindexing and optional mean-centering.

Description

The RLRMCdataset class takes training (and optionally validation/test) DataFrames and creates internal index mappings (user2id, id2user, item2id, id2item) for users and items by assigning contiguous integer indices. The _data_processing method concatenates all provided DataFrames to build a complete user/item index, then constructs scipy CSR sparse matrices for efficient storage and access of rating data. When mean_center is True (the default), the global training mean is subtracted from all rating values, which is important for the RLRMC algorithm's convergence. The model_param dictionary stores the number of rows (num_row), columns (num_col), and the train_mean for later use during prediction (to add back the mean). The _reindex method handles the mapping of original user/item IDs to their integer indices via pandas merge operations.

Usage

Use the RLRMCdataset class to prepare data for the RLRMC algorithm. Provide training data as a pandas DataFrame with user, item, and rating columns. Optionally provide validation and test DataFrames. The resulting object provides sparse matrices (train and validation attributes) and model parameters needed by the RLRMC optimization algorithm.

Code Reference

Source Location

Signature

class RLRMCdataset(object):
    def __init__(
        self,
        train,
        validation=None,
        test=None,
        mean_center=True,
        col_user=DEFAULT_USER_COL,
        col_item=DEFAULT_ITEM_COL,
        col_rating=DEFAULT_RATING_COL,
        col_timestamp=DEFAULT_TIMESTAMP_COL,
    )
    def _data_processing(self, train, validation=None, test=None, mean_center=True)
    def _reindex(self, df)

Import

from recommenders.models.rlrmc.RLRMCdataset import RLRMCdataset

I/O Contract

Inputs

Name Type Required Description
train pandas.DataFrame Yes Training data with at least columns (col_user, col_item, col_rating)
validation pandas.DataFrame No Validation data with at least columns (col_user, col_item, col_rating); None if not used
test pandas.DataFrame No Test data with at least columns (col_user, col_item, col_rating); None if not used
mean_center bool No Flag to mean-center ratings in train and validation data (default True)
col_user str No User column name (default DEFAULT_USER_COL)
col_item str No Item column name (default DEFAULT_ITEM_COL)
col_rating str No Rating column name (default DEFAULT_RATING_COL)
col_timestamp str No Timestamp column name (default DEFAULT_TIMESTAMP_COL)

Outputs

Name Type Description
train (attribute) scipy.sparse.csr_matrix Sparse CSR matrix of training ratings (mean-centered if enabled)
validation (attribute) scipy.sparse.csr_matrix or None Sparse CSR matrix of validation ratings, or None if no validation data
model_param (attribute) dict Dictionary with keys "num_row", "num_col", and "train_mean"
user2id (attribute) dict Mapping from original user IDs to integer indices
id2user (attribute) dict Mapping from integer indices to original user IDs
item2id (attribute) dict Mapping from original item IDs to integer indices
id2item (attribute) dict Mapping from integer indices to original item IDs

Usage Examples

Basic Usage

from recommenders.models.rlrmc.RLRMCdataset import RLRMCdataset

# Create dataset from training and validation DataFrames
dataset = RLRMCdataset(
    train=train_df,
    validation=valid_df,
    mean_center=True,
)

# Access the sparse training matrix
train_sparse = dataset.train  # scipy CSR matrix

# Access model parameters
num_users = dataset.model_param["num_row"]
num_items = dataset.model_param["num_col"]
train_mean = dataset.model_param["train_mean"]

# Convert user/item IDs
user_idx = dataset.user2id[original_user_id]
original_id = dataset.id2user[user_idx]

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment