Implementation:Fastai Fastbook CollabDataLoaders From Df
| Knowledge Sources | |
|---|---|
| Domains | Recommender Systems, Data Pipeline |
| Last Updated | 2026-02-09 17:00 GMT |
Overview
Concrete tool for constructing collaborative filtering DataLoaders from a pandas DataFrame provided by fastai.collab.
Description
CollabDataLoaders.from_df is a factory class method that takes a merged ratings DataFrame and produces a DataLoaders object. It automatically identifies user, item, and rating columns; performs categorical encoding to create contiguous integer indices; splits the data into training and validation sets; and assembles mini-batches as PyTorch tensors. The item_name parameter controls which column serves as the item identifier (e.g., raw movie IDs vs. human-readable titles).
Usage
Import CollabDataLoaders from fastai.collab and call from_df after preparing the merged ratings DataFrame. The returned DataLoaders object is then passed directly to collab_learner or a manual Learner constructor.
Code Reference
Source Location
- Repository: fastbook
- File: translations/cn/08_collab.md (Lines 151-152)
Signature
CollabDataLoaders.from_df(
ratings: pd.DataFrame,
valid_pct: float = 0.2,
user_name: str = None, # defaults to first column
item_name: str = None, # defaults to second column; use 'title' for readable names
rating_name: str = None, # defaults to third column
seed: int = None,
path: str = '.',
bs: int = 64,
val_bs: int = None,
shuffle_train: bool = True,
device: torch.device = None
) -> DataLoaders
Import
from fastai.collab import CollabDataLoaders
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| ratings | pd.DataFrame | Yes | Merged DataFrame containing at minimum user, item, and rating columns |
| item_name | str | No | Column name to use as the item identifier; set to 'title' to display movie titles instead of numeric IDs
|
| bs | int | No | Mini-batch size for the training DataLoader; defaults to 64 |
| valid_pct | float | No | Fraction of data to reserve for validation; defaults to 0.2 |
Outputs
| Name | Type | Description |
|---|---|---|
| dls | DataLoaders | A fastai DataLoaders containing training and validation DataLoaders |
| dls.classes | dict | Dictionary mapping column names to lists of unique categorical values; e.g., dls.classes['user'] and dls.classes['title']
|
| Batch x | Tensor (bs, 2) | Each batch independent variable is a LongTensor of (user_idx, item_idx) pairs |
| Batch y | Tensor (bs,) | Each batch dependent variable is a FloatTensor of ratings |
Usage Examples
Basic Usage
from fastai.collab import *
from fastai.tabular.all import *
# Assumes 'ratings' DataFrame already loaded and merged with movie titles
# (see Fastai_Fastbook_Collab_Untar_Data)
# Create DataLoaders using movie titles as item names, batch size 64
dls = CollabDataLoaders.from_df(ratings, item_name='title', bs=64)
# Display a sample batch
dls.show_batch()
# Output (example):
# user title rating
# 0 207 Four Weddings and a Funeral (1994) 3
# 1 565 Remains of the Day, The (1993) 5
# 2 506 Kids (1995) 1
# Inspect the number of unique users and items
n_users = len(dls.classes['user'])
n_movies = len(dls.classes['title'])
print(f'Users: {n_users}, Movies: {n_movies}')
# Output: Users: 944, Movies: 1635
# Inspect a single batch shape
x, y = dls.one_batch()
print(x.shape, y.shape)
# Output: torch.Size([64, 2]) torch.Size([64])