Implementation:Recommenders team Recommenders Python Chrono Split

Knowledge Sources	Recommenders
Domains	Recommender Systems, Data Splitting, Temporal Evaluation
Last Updated	2026-02-10 00:00 GMT

Overview

Concrete tool for chronological (time-based) data splitting provided by the recommenders library.

Description

python_chrono_split is a pure-Python (Pandas-based) function that splits a user-item interaction DataFrame in chronological order. For each user (or item, depending on filter_by), it sorts interactions by the timestamp column and assigns the earliest fraction (controlled by ratio) to training and the remaining fraction to testing. Users or items with fewer than min_rating interactions are excluded from the output. Internally, the function delegates to _do_stratification with is_random=False, ensuring deterministic temporal ordering rather than random shuffling.

Usage

Import python_chrono_split when you need to split interaction data into training and test sets while preserving temporal order. This is the standard splitting function for offline evaluation of recommender models in the recommenders library, including NCF and SAR workflows. Use this instead of python_random_split when timestamps are available and temporal leakage must be avoided.

Code Reference

Source Location

Repository: recommenders
File: recommenders/datasets/python_splitters.py
Lines: 116-158

Signature

def python_chrono_split(
    data,
    ratio=0.75,
    min_rating=1,
    filter_by="user",
    col_user=DEFAULT_USER_COL,
    col_item=DEFAULT_ITEM_COL,
    col_timestamp=DEFAULT_TIMESTAMP_COL,
) -> list:

Import

from recommenders.datasets.python_splitters import python_chrono_split

I/O Contract

Inputs

Name	Type	Required	Description
data	pandas.DataFrame	Yes	DataFrame containing user-item interactions with at least user, item, and timestamp columns
ratio	float or list	No	Split ratio; a single float (e.g., 0.75) produces a two-way split; a list (e.g., [0.6, 0.2, 0.2]) produces a multi-way split. Defaults to 0.75
min_rating	int	No	Minimum number of interactions required for a user or item to be included in the output. Defaults to 1
filter_by	str	No	Either `"user"` or `"item"`, specifying which entity the `min_rating` filter and stratification are applied to. Defaults to `"user"`
col_user	str	No	Name of the user ID column. Defaults to `DEFAULT_USER_COL` (`"userID"`)
col_item	str	No	Name of the item ID column. Defaults to `DEFAULT_ITEM_COL` (`"itemID"`)
col_timestamp	str	No	Name of the timestamp column. Defaults to `DEFAULT_TIMESTAMP_COL` (`"timestamp"`)

Outputs

Name	Type	Description
splits	list[pandas.DataFrame]	A list of DataFrames corresponding to each split segment. For a single float ratio, returns `[train_df, test_df]`. For a list ratio of length k, returns k DataFrames in chronological order

Usage Examples

Basic Usage

import pandas as pd
from recommenders.datasets.python_splitters import python_chrono_split

# Load interaction data (must have userID, itemID, timestamp columns)
data = pd.read_csv("ratings.csv")

# Split 75% train / 25% test by chronological order per user
train, test = python_chrono_split(data, ratio=0.75, filter_by="user")

print(f"Train size: {len(train)}, Test size: {len(test)}")

Multi-Way Split

# Three-way split: 60% train, 20% validation, 20% test
train, val, test = python_chrono_split(data, ratio=[0.6, 0.2, 0.2])

With MovieLens Data

from recommenders.datasets import movielens
from recommenders.datasets.python_splitters import python_chrono_split

data = movielens.load_pandas_df(size="100k")
train, test = python_chrono_split(data, ratio=0.75)

Related Pages

Implements Principle

Principle:Recommenders_team_Recommenders_Chronological_Data_Splitting

Requires Environment

Environment:Recommenders_team_Recommenders_Python_Core_Dependencies

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment