Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:Fastai Fastbook CollabDataLoaders From Df

From Leeroopedia


Knowledge Sources
Domains Recommender Systems, Data Pipeline
Last Updated 2026-02-09 17:00 GMT

Overview

Concrete tool for constructing collaborative filtering DataLoaders from a pandas DataFrame provided by fastai.collab.

Description

CollabDataLoaders.from_df is a factory class method that takes a merged ratings DataFrame and produces a DataLoaders object. It automatically identifies user, item, and rating columns; performs categorical encoding to create contiguous integer indices; splits the data into training and validation sets; and assembles mini-batches as PyTorch tensors. The item_name parameter controls which column serves as the item identifier (e.g., raw movie IDs vs. human-readable titles).

Usage

Import CollabDataLoaders from fastai.collab and call from_df after preparing the merged ratings DataFrame. The returned DataLoaders object is then passed directly to collab_learner or a manual Learner constructor.

Code Reference

Source Location

  • Repository: fastbook
  • File: translations/cn/08_collab.md (Lines 151-152)

Signature

CollabDataLoaders.from_df(
    ratings: pd.DataFrame,
    valid_pct: float = 0.2,
    user_name: str = None,      # defaults to first column
    item_name: str = None,      # defaults to second column; use 'title' for readable names
    rating_name: str = None,    # defaults to third column
    seed: int = None,
    path: str = '.',
    bs: int = 64,
    val_bs: int = None,
    shuffle_train: bool = True,
    device: torch.device = None
) -> DataLoaders

Import

from fastai.collab import CollabDataLoaders

I/O Contract

Inputs

Name Type Required Description
ratings pd.DataFrame Yes Merged DataFrame containing at minimum user, item, and rating columns
item_name str No Column name to use as the item identifier; set to 'title' to display movie titles instead of numeric IDs
bs int No Mini-batch size for the training DataLoader; defaults to 64
valid_pct float No Fraction of data to reserve for validation; defaults to 0.2

Outputs

Name Type Description
dls DataLoaders A fastai DataLoaders containing training and validation DataLoaders
dls.classes dict Dictionary mapping column names to lists of unique categorical values; e.g., dls.classes['user'] and dls.classes['title']
Batch x Tensor (bs, 2) Each batch independent variable is a LongTensor of (user_idx, item_idx) pairs
Batch y Tensor (bs,) Each batch dependent variable is a FloatTensor of ratings

Usage Examples

Basic Usage

from fastai.collab import *
from fastai.tabular.all import *

# Assumes 'ratings' DataFrame already loaded and merged with movie titles
# (see Fastai_Fastbook_Collab_Untar_Data)

# Create DataLoaders using movie titles as item names, batch size 64
dls = CollabDataLoaders.from_df(ratings, item_name='title', bs=64)

# Display a sample batch
dls.show_batch()
# Output (example):
#      user                          title  rating
# 0     207  Four Weddings and a Funeral (1994)       3
# 1     565          Remains of the Day, The (1993)       5
# 2     506                      Kids (1995)       1

# Inspect the number of unique users and items
n_users  = len(dls.classes['user'])
n_movies = len(dls.classes['title'])
print(f'Users: {n_users}, Movies: {n_movies}')
# Output: Users: 944, Movies: 1635

# Inspect a single batch shape
x, y = dls.one_batch()
print(x.shape, y.shape)
# Output: torch.Size([64, 2]) torch.Size([64])

Related Pages

Implements Principle

Requires Environment

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment