Implementation:Recommenders team Recommenders EmbDotBias DataLoader

Knowledge Sources	Recommenders
Domains	Collaborative Filtering, Data Loading, PyTorch
Last Updated	2026-02-10 00:00 GMT

Overview

Provides PyTorch Dataset and DataLoader classes for loading, indexing, and batching user-item-rating data for the EmbeddingDotBias collaborative filtering model.

Description

This module contains two classes that form the data ingestion pipeline for the EmbDotBias model. RecoDataset is a PyTorch Dataset subclass that stores user, item, and rating arrays as tensors and returns (user_item_pair, rating) tuples suitable for embedding-based collaborative filtering. RecoDataLoader is a utility class that manages training and validation DataLoaders together with user/item metadata. Its from_df class method is the primary entry point: it accepts a pandas DataFrame, creates string-sorted categorical mappings with a #na# placeholder at index 0, converts raw user/item IDs to contiguous integer indices suitable for embedding lookups, performs a random train/validation split with reproducible seeding, and wraps the resulting datasets in PyTorch DataLoaders. The class also stores user2idx and item2idx mapping dictionaries and a classes dictionary for index-to-ID lookups. A show_batch method is provided for quick inspection of training batches.

Usage

Use this module when preparing data for the EmbeddingDotBias collaborative filtering model. It is the standard way to convert a pandas DataFrame of user-item-rating interactions into PyTorch DataLoaders with proper categorical index encoding. Use RecoDataLoader.from_df to create the full data pipeline from a DataFrame, or instantiate RecoDataset directly if you need custom DataLoader configurations.

Code Reference

Source Location

Repository: Recommenders
File: recommenders/models/embdotbias/data_loader.py
Lines: 1-281

Signature

class RecoDataset(Dataset):
    def __init__(self, users, items, ratings)
    def __len__(self)
    def __getitem__(self, idx)

class RecoDataLoader:
    def __init__(self, train_dl, valid_dl=None)

    @classmethod
    def from_df(
        cls,
        ratings,
        valid_pct=0.2,
        user_name=None,
        item_name=None,
        rating_name=None,
        seed=42,
        batch_size=64,
        **kwargs,
    )

    def show_batch(self, n=5)

Import

from recommenders.models.embdotbias.data_loader import RecoDataset, RecoDataLoader

I/O Contract

Inputs

RecoDataset.init

Name	Type	Required	Description
users	array-like	Yes	User IDs or indices
items	array-like	Yes	Item IDs or indices
ratings	array-like	Yes	Ratings or interaction values

RecoDataLoader.from_df

Name	Type	Required	Description
ratings	pd.DataFrame	Yes	DataFrame containing user, item, and rating columns
valid_pct	float	No	Fraction of data for validation (default 0.2)
user_name	str	No	Name of the user column (defaults to first column)
item_name	str	No	Name of the item column (defaults to second column)
rating_name	str	No	Name of the rating column (defaults to third column)
seed	int	No	Random seed for reproducibility (default 42)
batch_size	int	No	Batch size for DataLoaders (default 64)
**kwargs	dict	No	Additional DataLoader arguments

RecoDataLoader.show_batch

Name	Type	Required	Description
n	int	No	Number of examples to show from the batch (default 5)

Outputs

RecoDataset.getitem

Name	Type	Description
return	tuple(Tensor, Tensor)	A tuple of (user_item_tensor of shape [2], rating_tensor of shape [1])

RecoDataLoader.from_df

Name	Type	Description
return	RecoDataLoader	Instance with train/valid DataLoaders and metadata (classes, n_users, n_items, user2idx, item2idx)

RecoDataLoader.show_batch

Name	Type	Description
return	None	Prints a sample of training batch data to stdout

Usage Examples

Basic Usage

import pandas as pd
from recommenders.models.embdotbias.data_loader import RecoDataLoader

# Prepare a DataFrame with user, item, and rating columns
df = pd.DataFrame({
    "userID": [1, 1, 2, 2, 3],
    "itemID": [10, 20, 10, 30, 20],
    "rating": [4.0, 3.5, 5.0, 2.0, 4.5],
})

# Create DataLoaders from the DataFrame
data = RecoDataLoader.from_df(
    df,
    valid_pct=0.2,
    user_name="userID",
    item_name="itemID",
    rating_name="rating",
    seed=42,
    batch_size=32,
)

# Access metadata
print(f"Number of users: {data.n_users}")
print(f"Number of items: {data.n_items}")
print(f"User classes: {data.classes['userID']}")

# Inspect a training batch
data.show_batch(n=3)

# Iterate over training DataLoader
for user_item_batch, ratings_batch in data.train:
    users = user_item_batch[:, 0]
    items = user_item_batch[:, 1]
    # Feed to model...
    break

Related Pages

Implementation:Recommenders_team_Recommenders_NCF_Dataset_Init

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment

Overview

Description

Usage

Code Reference

Source Location

Signature

Import

I/O Contract

Inputs

RecoDataset.__init__

RecoDataLoader.from_df

RecoDataLoader.show_batch

Outputs

RecoDataset.__getitem__

RecoDataLoader.from_df

RecoDataLoader.show_batch

Usage Examples

Basic Usage

Related Pages

Page Connections

RecoDataset.init

RecoDataset.getitem