Implementation:Fastai Fastbook CollabDataLoaders From Df

Knowledge Sources	fastbook fastai docs
Domains	Recommender Systems, Data Pipeline
Last Updated	2026-02-09 17:00 GMT

Overview

Concrete tool for constructing collaborative filtering DataLoaders from a pandas DataFrame provided by fastai.collab.

Description

CollabDataLoaders.from_df is a factory class method that takes a merged ratings DataFrame and produces a DataLoaders object. It automatically identifies user, item, and rating columns; performs categorical encoding to create contiguous integer indices; splits the data into training and validation sets; and assembles mini-batches as PyTorch tensors. The item_name parameter controls which column serves as the item identifier (e.g., raw movie IDs vs. human-readable titles).

Usage

Import CollabDataLoaders from fastai.collab and call from_df after preparing the merged ratings DataFrame. The returned DataLoaders object is then passed directly to collab_learner or a manual Learner constructor.

Code Reference

Source Location

Repository: fastbook
File: translations/cn/08_collab.md (Lines 151-152)

Signature

CollabDataLoaders.from_df(
    ratings: pd.DataFrame,
    valid_pct: float = 0.2,
    user_name: str = None,      # defaults to first column
    item_name: str = None,      # defaults to second column; use 'title' for readable names
    rating_name: str = None,    # defaults to third column
    seed: int = None,
    path: str = '.',
    bs: int = 64,
    val_bs: int = None,
    shuffle_train: bool = True,
    device: torch.device = None
) -> DataLoaders

Import

from fastai.collab import CollabDataLoaders

I/O Contract

Inputs

Name	Type	Required	Description
ratings	pd.DataFrame	Yes	Merged DataFrame containing at minimum user, item, and rating columns
item_name	str	No	Column name to use as the item identifier; set to `'title'` to display movie titles instead of numeric IDs
bs	int	No	Mini-batch size for the training DataLoader; defaults to 64
valid_pct	float	No	Fraction of data to reserve for validation; defaults to 0.2

Outputs

Name	Type	Description
dls	DataLoaders	A fastai DataLoaders containing training and validation DataLoaders
dls.classes	dict	Dictionary mapping column names to lists of unique categorical values; e.g., `dls.classes['user']` and `dls.classes['title']`
Batch x	Tensor (bs, 2)	Each batch independent variable is a LongTensor of (user_idx, item_idx) pairs
Batch y	Tensor (bs,)	Each batch dependent variable is a FloatTensor of ratings

Usage Examples

Basic Usage

from fastai.collab import *
from fastai.tabular.all import *

# Assumes 'ratings' DataFrame already loaded and merged with movie titles
# (see Fastai_Fastbook_Collab_Untar_Data)

# Create DataLoaders using movie titles as item names, batch size 64
dls = CollabDataLoaders.from_df(ratings, item_name='title', bs=64)

# Display a sample batch
dls.show_batch()
# Output (example):
#      user                          title  rating
# 0     207  Four Weddings and a Funeral (1994)       3
# 1     565          Remains of the Day, The (1993)       5
# 2     506                      Kids (1995)       1

# Inspect the number of unique users and items
n_users  = len(dls.classes['user'])
n_movies = len(dls.classes['title'])
print(f'Users: {n_users}, Movies: {n_movies}')
# Output: Users: 944, Movies: 1635

# Inspect a single batch shape
x, y = dls.one_batch()
print(x.shape, y.shape)
# Output: torch.Size([64, 2]) torch.Size([64])

Related Pages

Implements Principle

Principle:Fastai_Fastbook_Collab_DataLoaders

Requires Environment

Environment:Fastai_Fastbook_Python_FastAI_Environment

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment