Implementation:Recommenders team Recommenders RLRMC Dataset
| Knowledge Sources | |
|---|---|
| Domains | Matrix Completion, Data Preprocessing, Sparse Matrix |
| Last Updated | 2026-02-10 00:00 GMT |
Overview
The RLRMCdataset class prepares and manages sparse data structures required by the RLRMC (Riemannian Low-Rank Matrix Completion) algorithm, converting pandas DataFrames to CSR sparse matrices with user/item reindexing and optional mean-centering.
Description
The RLRMCdataset class takes training (and optionally validation/test) DataFrames and creates internal index mappings (user2id, id2user, item2id, id2item) for users and items by assigning contiguous integer indices. The _data_processing method concatenates all provided DataFrames to build a complete user/item index, then constructs scipy CSR sparse matrices for efficient storage and access of rating data. When mean_center is True (the default), the global training mean is subtracted from all rating values, which is important for the RLRMC algorithm's convergence. The model_param dictionary stores the number of rows (num_row), columns (num_col), and the train_mean for later use during prediction (to add back the mean). The _reindex method handles the mapping of original user/item IDs to their integer indices via pandas merge operations.
Usage
Use the RLRMCdataset class to prepare data for the RLRMC algorithm. Provide training data as a pandas DataFrame with user, item, and rating columns. Optionally provide validation and test DataFrames. The resulting object provides sparse matrices (train and validation attributes) and model parameters needed by the RLRMC optimization algorithm.
Code Reference
Source Location
- Repository: Recommenders
- File: recommenders/models/rlrmc/RLRMCdataset.py
- Lines: 1-154
Signature
class RLRMCdataset(object):
def __init__(
self,
train,
validation=None,
test=None,
mean_center=True,
col_user=DEFAULT_USER_COL,
col_item=DEFAULT_ITEM_COL,
col_rating=DEFAULT_RATING_COL,
col_timestamp=DEFAULT_TIMESTAMP_COL,
)
def _data_processing(self, train, validation=None, test=None, mean_center=True)
def _reindex(self, df)
Import
from recommenders.models.rlrmc.RLRMCdataset import RLRMCdataset
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| train | pandas.DataFrame | Yes | Training data with at least columns (col_user, col_item, col_rating) |
| validation | pandas.DataFrame | No | Validation data with at least columns (col_user, col_item, col_rating); None if not used |
| test | pandas.DataFrame | No | Test data with at least columns (col_user, col_item, col_rating); None if not used |
| mean_center | bool | No | Flag to mean-center ratings in train and validation data (default True) |
| col_user | str | No | User column name (default DEFAULT_USER_COL) |
| col_item | str | No | Item column name (default DEFAULT_ITEM_COL) |
| col_rating | str | No | Rating column name (default DEFAULT_RATING_COL) |
| col_timestamp | str | No | Timestamp column name (default DEFAULT_TIMESTAMP_COL) |
Outputs
| Name | Type | Description |
|---|---|---|
| train (attribute) | scipy.sparse.csr_matrix | Sparse CSR matrix of training ratings (mean-centered if enabled) |
| validation (attribute) | scipy.sparse.csr_matrix or None | Sparse CSR matrix of validation ratings, or None if no validation data |
| model_param (attribute) | dict | Dictionary with keys "num_row", "num_col", and "train_mean" |
| user2id (attribute) | dict | Mapping from original user IDs to integer indices |
| id2user (attribute) | dict | Mapping from integer indices to original user IDs |
| item2id (attribute) | dict | Mapping from original item IDs to integer indices |
| id2item (attribute) | dict | Mapping from integer indices to original item IDs |
Usage Examples
Basic Usage
from recommenders.models.rlrmc.RLRMCdataset import RLRMCdataset
# Create dataset from training and validation DataFrames
dataset = RLRMCdataset(
train=train_df,
validation=valid_df,
mean_center=True,
)
# Access the sparse training matrix
train_sparse = dataset.train # scipy CSR matrix
# Access model parameters
num_users = dataset.model_param["num_row"]
num_items = dataset.model_param["num_col"]
train_mean = dataset.model_param["train_mean"]
# Convert user/item IDs
user_idx = dataset.user2id[original_user_id]
original_id = dataset.id2user[user_idx]