Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:Recommenders team Recommenders Python Chrono Split

From Leeroopedia


Knowledge Sources
Domains Recommender Systems, Data Splitting, Temporal Evaluation
Last Updated 2026-02-10 00:00 GMT

Overview

Concrete tool for chronological (time-based) data splitting provided by the recommenders library.

Description

python_chrono_split is a pure-Python (Pandas-based) function that splits a user-item interaction DataFrame in chronological order. For each user (or item, depending on filter_by), it sorts interactions by the timestamp column and assigns the earliest fraction (controlled by ratio) to training and the remaining fraction to testing. Users or items with fewer than min_rating interactions are excluded from the output. Internally, the function delegates to _do_stratification with is_random=False, ensuring deterministic temporal ordering rather than random shuffling.

Usage

Import python_chrono_split when you need to split interaction data into training and test sets while preserving temporal order. This is the standard splitting function for offline evaluation of recommender models in the recommenders library, including NCF and SAR workflows. Use this instead of python_random_split when timestamps are available and temporal leakage must be avoided.

Code Reference

Source Location

  • Repository: recommenders
  • File: recommenders/datasets/python_splitters.py
  • Lines: 116-158

Signature

def python_chrono_split(
    data,
    ratio=0.75,
    min_rating=1,
    filter_by="user",
    col_user=DEFAULT_USER_COL,
    col_item=DEFAULT_ITEM_COL,
    col_timestamp=DEFAULT_TIMESTAMP_COL,
) -> list:

Import

from recommenders.datasets.python_splitters import python_chrono_split

I/O Contract

Inputs

Name Type Required Description
data pandas.DataFrame Yes DataFrame containing user-item interactions with at least user, item, and timestamp columns
ratio float or list No Split ratio; a single float (e.g., 0.75) produces a two-way split; a list (e.g., [0.6, 0.2, 0.2]) produces a multi-way split. Defaults to 0.75
min_rating int No Minimum number of interactions required for a user or item to be included in the output. Defaults to 1
filter_by str No Either "user" or "item", specifying which entity the min_rating filter and stratification are applied to. Defaults to "user"
col_user str No Name of the user ID column. Defaults to DEFAULT_USER_COL ("userID")
col_item str No Name of the item ID column. Defaults to DEFAULT_ITEM_COL ("itemID")
col_timestamp str No Name of the timestamp column. Defaults to DEFAULT_TIMESTAMP_COL ("timestamp")

Outputs

Name Type Description
splits list[pandas.DataFrame] A list of DataFrames corresponding to each split segment. For a single float ratio, returns [train_df, test_df]. For a list ratio of length k, returns k DataFrames in chronological order

Usage Examples

Basic Usage

import pandas as pd
from recommenders.datasets.python_splitters import python_chrono_split

# Load interaction data (must have userID, itemID, timestamp columns)
data = pd.read_csv("ratings.csv")

# Split 75% train / 25% test by chronological order per user
train, test = python_chrono_split(data, ratio=0.75, filter_by="user")

print(f"Train size: {len(train)}, Test size: {len(test)}")

Multi-Way Split

# Three-way split: 60% train, 20% validation, 20% test
train, val, test = python_chrono_split(data, ratio=[0.6, 0.2, 0.2])

With MovieLens Data

from recommenders.datasets import movielens
from recommenders.datasets.python_splitters import python_chrono_split

data = movielens.load_pandas_df(size="100k")
train, test = python_chrono_split(data, ratio=0.75)

Related Pages

Implements Principle

Requires Environment

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment