Implementation:Recommenders team Recommenders Python Stratified Split

Knowledge Sources	Recommenders
Domains	Recommender Systems, Data Splitting, Evaluation Methodology
Last Updated	2026-02-10 00:00 GMT

Overview

Concrete tool for performing stratified train/test splitting of user-item interaction data provided by the recommenders library.

Description

The python_stratified_split function splits a pandas DataFrame of user-item interactions into training and test sets while preserving per-user (or per-item) rating proportions. It delegates to an internal stratification routine that groups the data by the specified entity (user or item), filters out entities with fewer interactions than a minimum threshold, and performs a randomized proportional split within each group. The function supports both two-way splits (single float ratio) and multi-way splits (list of float ratios).

Usage

Import and call this function after loading your dataset and before model training. It is used to create reproducible, stratified train/test splits that ensure every user (or item) is represented in both splits.

Code Reference

Source Location

Repository: recommenders
File: recommenders/datasets/python_splitters.py
Lines: L161-L201

Signature

def python_stratified_split(
    data,
    ratio=0.75,
    min_rating=1,
    filter_by="user",
    col_user=DEFAULT_USER_COL,
    col_item=DEFAULT_ITEM_COL,
    seed=42,
) -> list[pd.DataFrame]

Import

from recommenders.datasets.python_splitters import python_stratified_split

I/O Contract

Inputs

Name	Type	Required	Description
data	pd.DataFrame	Yes	User-item interaction DataFrame to be split.
ratio	float or list of float	No (default: 0.75)	Split ratio. A single float produces a two-way split (train/test). A list of floats produces multiple splits. Ratios are normalized to sum to 1 if they do not already.
min_rating	int	No (default: 1)	Minimum number of ratings a user or item must have to be included in the split. Entities below this threshold are filtered out.
filter_by	str	No (default: "user")	Entity to stratify and filter by. Either "user" or "item".
col_user	str	No (default: DEFAULT_USER_COL)	Column name for user IDs.
col_item	str	No (default: DEFAULT_ITEM_COL)	Column name for item IDs.
seed	int	No (default: 42)	Random seed for reproducible splits.

Outputs

Name	Type	Description
return	list[pd.DataFrame]	List of DataFrames corresponding to each split. For a single float ratio, returns a list of two DataFrames [train, test]. For a list of ratios, returns one DataFrame per ratio element.

Usage Examples

Basic Usage

from recommenders.datasets.python_splitters import python_stratified_split

# Two-way 75/25 stratified split by user
train, test = python_stratified_split(data, ratio=0.75, seed=42)

# Three-way split (train/val/test) with 60/20/20 ratio
train, val, test = python_stratified_split(data, ratio=[0.6, 0.2, 0.2])

# Stratify by item instead of user
train, test = python_stratified_split(data, ratio=0.75, filter_by="item")

# Filter out users with fewer than 5 ratings
train, test = python_stratified_split(data, ratio=0.75, min_rating=5)

Dependencies

numpy - Random number generation
pandas - DataFrame manipulation and groupby operations
sklearn - Stratified splitting utilities (via internal delegation)

Related Pages

Implements Principle

Principle:Recommenders_team_Recommenders_Stratified_Data_Splitting

Requires Environment

Environment:Recommenders_team_Recommenders_Python_Core_Dependencies

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment