Implementation:Recommenders team Recommenders Load Pandas Df

Knowledge Sources	Recommenders
Domains	Recommender Systems, Data Loading, Benchmark Datasets
Last Updated	2026-02-10 00:00 GMT

Overview

Concrete tool for loading MovieLens benchmark datasets into pandas DataFrames provided by the recommenders library.

Description

The load_pandas_df function downloads, caches, extracts, and parses MovieLens datasets into a pandas DataFrame. It supports five dataset sizes (100K, 1M, 10M, 20M, and a mock dataset for testing) and handles the format differences between them transparently. The function can optionally join movie metadata columns (title, genres, release year) onto the ratings data. It uses a local cache to avoid re-downloading and supports a mock data mode for unit testing scenarios.

Usage

Import and call this function at the start of a recommender system experiment pipeline when you need MovieLens data in a pandas DataFrame. Use the size parameter to select the dataset scale, and pass title_col, genres_col, or year_col to include movie metadata in the output.

Code Reference

Source Location

Repository: recommenders
File: recommenders/datasets/movielens.py
Lines: L152-L251

Signature

def load_pandas_df(
    size="100k",
    header=None,
    local_cache_path=None,
    title_col=None,
    genres_col=None,
    year_col=None,
) -> pd.DataFrame

Import

from recommenders.datasets.movielens import load_pandas_df

I/O Contract

Inputs

Name	Type	Required	Description
size	str	No (default: "100k")	Size of the MovieLens dataset to load. One of "100k", "1m", "10m", "20m", "mock100".
header	list or tuple or None	No (default: None)	Column names for the rating data. If None, uses DEFAULT_HEADER (userID, itemID, rating, timestamp). Truncated to 4 elements if longer.
local_cache_path	str or None	No (default: None)	Directory or zip file path for caching the downloaded archive. If None, uses a temporary directory that is cleaned up after use.
title_col	str or None	No (default: None)	Column name for the movie title. If None, title is not loaded.
genres_col	str or None	No (default: None)	Column name for movie genres (pipe-separated string). If None, genres are not loaded.
year_col	str or None	No (default: None)	Column name for movie release year. If None, year is not loaded. Ignored for mock data.

Outputs

Name	Type	Description
return	pd.DataFrame	DataFrame containing user-item-rating-timestamp columns, plus any requested metadata columns (title, genres, year). Rating column is cast to float.

Usage Examples

Basic Usage

from recommenders.datasets.movielens import load_pandas_df

# Load MovieLens 100K with default columns (userID, itemID, rating, timestamp)
df = load_pandas_df("100k")

# Load MovieLens 1M with custom column names
df = load_pandas_df("1m", header=["UserId", "ItemId", "Rating", "Timestamp"])

# Load with movie metadata
df = load_pandas_df(
    "1m",
    header=["UserId", "ItemId", "Rating", "Timestamp"],
    title_col="Title",
    genres_col="Genres",
    year_col="Year",
)

# Load mock data for testing
df = load_pandas_df("mock100")

Dependencies

pandas - DataFrame construction and CSV parsing
pandera - Schema validation for mock data
zipfile - Archive extraction
os / tempfile - File path management and temporary directories

Related Pages

Implements Principle

Principle:Recommenders_team_Recommenders_Data_Loading_MovieLens_Pandas

Requires Environment

Environment:Recommenders_team_Recommenders_Python_Core_Dependencies

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment