Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:Recommenders team Recommenders Load Pandas Df

From Leeroopedia


Knowledge Sources
Domains Recommender Systems, Data Loading, Benchmark Datasets
Last Updated 2026-02-10 00:00 GMT

Overview

Concrete tool for loading MovieLens benchmark datasets into pandas DataFrames provided by the recommenders library.

Description

The load_pandas_df function downloads, caches, extracts, and parses MovieLens datasets into a pandas DataFrame. It supports five dataset sizes (100K, 1M, 10M, 20M, and a mock dataset for testing) and handles the format differences between them transparently. The function can optionally join movie metadata columns (title, genres, release year) onto the ratings data. It uses a local cache to avoid re-downloading and supports a mock data mode for unit testing scenarios.

Usage

Import and call this function at the start of a recommender system experiment pipeline when you need MovieLens data in a pandas DataFrame. Use the size parameter to select the dataset scale, and pass title_col, genres_col, or year_col to include movie metadata in the output.

Code Reference

Source Location

  • Repository: recommenders
  • File: recommenders/datasets/movielens.py
  • Lines: L152-L251

Signature

def load_pandas_df(
    size="100k",
    header=None,
    local_cache_path=None,
    title_col=None,
    genres_col=None,
    year_col=None,
) -> pd.DataFrame

Import

from recommenders.datasets.movielens import load_pandas_df

I/O Contract

Inputs

Name Type Required Description
size str No (default: "100k") Size of the MovieLens dataset to load. One of "100k", "1m", "10m", "20m", "mock100".
header list or tuple or None No (default: None) Column names for the rating data. If None, uses DEFAULT_HEADER (userID, itemID, rating, timestamp). Truncated to 4 elements if longer.
local_cache_path str or None No (default: None) Directory or zip file path for caching the downloaded archive. If None, uses a temporary directory that is cleaned up after use.
title_col str or None No (default: None) Column name for the movie title. If None, title is not loaded.
genres_col str or None No (default: None) Column name for movie genres (pipe-separated string). If None, genres are not loaded.
year_col str or None No (default: None) Column name for movie release year. If None, year is not loaded. Ignored for mock data.

Outputs

Name Type Description
return pd.DataFrame DataFrame containing user-item-rating-timestamp columns, plus any requested metadata columns (title, genres, year). Rating column is cast to float.

Usage Examples

Basic Usage

from recommenders.datasets.movielens import load_pandas_df

# Load MovieLens 100K with default columns (userID, itemID, rating, timestamp)
df = load_pandas_df("100k")

# Load MovieLens 1M with custom column names
df = load_pandas_df("1m", header=["UserId", "ItemId", "Rating", "Timestamp"])

# Load with movie metadata
df = load_pandas_df(
    "1m",
    header=["UserId", "ItemId", "Rating", "Timestamp"],
    title_col="Title",
    genres_col="Genres",
    year_col="Year",
)

# Load mock data for testing
df = load_pandas_df("mock100")

Dependencies

  • pandas - DataFrame construction and CSV parsing
  • pandera - Schema validation for mock data
  • zipfile - Archive extraction
  • os / tempfile - File path management and temporary directories

Related Pages

Implements Principle

Requires Environment

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment