Implementation:Recommenders team Recommenders Python Chrono Split
| Knowledge Sources | |
|---|---|
| Domains | Recommender Systems, Data Splitting, Temporal Evaluation |
| Last Updated | 2026-02-10 00:00 GMT |
Overview
Concrete tool for chronological (time-based) data splitting provided by the recommenders library.
Description
python_chrono_split is a pure-Python (Pandas-based) function that splits a user-item interaction DataFrame in chronological order. For each user (or item, depending on filter_by), it sorts interactions by the timestamp column and assigns the earliest fraction (controlled by ratio) to training and the remaining fraction to testing. Users or items with fewer than min_rating interactions are excluded from the output. Internally, the function delegates to _do_stratification with is_random=False, ensuring deterministic temporal ordering rather than random shuffling.
Usage
Import python_chrono_split when you need to split interaction data into training and test sets while preserving temporal order. This is the standard splitting function for offline evaluation of recommender models in the recommenders library, including NCF and SAR workflows. Use this instead of python_random_split when timestamps are available and temporal leakage must be avoided.
Code Reference
Source Location
- Repository: recommenders
- File: recommenders/datasets/python_splitters.py
- Lines: 116-158
Signature
def python_chrono_split(
data,
ratio=0.75,
min_rating=1,
filter_by="user",
col_user=DEFAULT_USER_COL,
col_item=DEFAULT_ITEM_COL,
col_timestamp=DEFAULT_TIMESTAMP_COL,
) -> list:
Import
from recommenders.datasets.python_splitters import python_chrono_split
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| data | pandas.DataFrame | Yes | DataFrame containing user-item interactions with at least user, item, and timestamp columns |
| ratio | float or list | No | Split ratio; a single float (e.g., 0.75) produces a two-way split; a list (e.g., [0.6, 0.2, 0.2]) produces a multi-way split. Defaults to 0.75 |
| min_rating | int | No | Minimum number of interactions required for a user or item to be included in the output. Defaults to 1 |
| filter_by | str | No | Either "user" or "item", specifying which entity the min_rating filter and stratification are applied to. Defaults to "user"
|
| col_user | str | No | Name of the user ID column. Defaults to DEFAULT_USER_COL ("userID")
|
| col_item | str | No | Name of the item ID column. Defaults to DEFAULT_ITEM_COL ("itemID")
|
| col_timestamp | str | No | Name of the timestamp column. Defaults to DEFAULT_TIMESTAMP_COL ("timestamp")
|
Outputs
| Name | Type | Description |
|---|---|---|
| splits | list[pandas.DataFrame] | A list of DataFrames corresponding to each split segment. For a single float ratio, returns [train_df, test_df]. For a list ratio of length k, returns k DataFrames in chronological order
|
Usage Examples
Basic Usage
import pandas as pd
from recommenders.datasets.python_splitters import python_chrono_split
# Load interaction data (must have userID, itemID, timestamp columns)
data = pd.read_csv("ratings.csv")
# Split 75% train / 25% test by chronological order per user
train, test = python_chrono_split(data, ratio=0.75, filter_by="user")
print(f"Train size: {len(train)}, Test size: {len(test)}")
Multi-Way Split
# Three-way split: 60% train, 20% validation, 20% test
train, val, test = python_chrono_split(data, ratio=[0.6, 0.2, 0.2])
With MovieLens Data
from recommenders.datasets import movielens
from recommenders.datasets.python_splitters import python_chrono_split
data = movielens.load_pandas_df(size="100k")
train, test = python_chrono_split(data, ratio=0.75)