Implementation:Recommenders team Recommenders NCF Dataset Init

Knowledge Sources	Recommenders
Domains	Recommender Systems, Implicit Feedback, Data Preparation
Last Updated	2026-02-10 00:00 GMT

Overview

Concrete tool for preparing implicit feedback data with negative sampling for Neural Collaborative Filtering provided by the recommenders library.

Description

Dataset.__init__ initializes the NCF dataset by loading training (and optionally test) interaction files, building user/item ID mappings, and configuring negative sampling parameters. When a test file is provided, it automatically generates a full test file that includes n_neg_test negative samples per positive test interaction, enabling the leave-one-out evaluation protocol. The class also provides a train_loader method that yields batches of (user, item, label) tuples with n_neg negatives sampled per positive example during training.

The binary flag converts any non-zero rating to 1, which is the standard treatment for implicit feedback. User and item IDs are mapped to contiguous integer indices via user2id and item2id dictionaries, which are later consumed by the NCF model's embedding layers.

Usage

Import and instantiate Dataset after splitting your interaction data into training and test CSV files. This is the required data preparation step before calling NCF.fit(). The training file is mandatory; the test file is needed only for evaluation. Adjust n_neg to control the training negative sampling ratio and n_neg_test for evaluation.

Code Reference

Source Location

Repository: recommenders
File: recommenders/models/ncf/dataset.py
Lines: 304-391

Signature

class Dataset(object):
    def __init__(
        self,
        train_file,
        test_file=None,
        test_file_full=None,
        overwrite_test_file_full=False,
        n_neg=4,
        n_neg_test=100,
        col_user=DEFAULT_USER_COL,
        col_item=DEFAULT_ITEM_COL,
        col_rating=DEFAULT_RATING_COL,
        binary=True,
        seed=None,
        sample_with_replacement=False,
        print_warnings=False,
    ):

Import

from recommenders.models.ncf.dataset import Dataset

I/O Contract

Inputs

Name	Type	Required	Description
train_file	str	Yes	Path to the training dataset CSV file containing user-item interactions
test_file	str	No	Path to the test dataset CSV file for leave-one-out evaluation. Defaults to `None`
test_file_full	str	No	Path to the full test file including negative samples. If `None` and `test_file` is provided, auto-generated as `test_file_full.csv`
overwrite_test_file_full	bool	No	If `True`, regenerate and overwrite the full test file even if it already exists. Defaults to `False`
n_neg	int	No	Number of negative samples per positive example during training. Defaults to 4
n_neg_test	int	No	Number of negative samples per positive example for evaluation. Defaults to 100
col_user	str	No	Name of the user ID column. Defaults to `DEFAULT_USER_COL` (`"userID"`)
col_item	str	No	Name of the item ID column. Defaults to `DEFAULT_ITEM_COL` (`"itemID"`)
col_rating	str	No	Name of the rating column. Defaults to `DEFAULT_RATING_COL` (`"rating"`)
binary	bool	No	If `True`, convert all non-zero ratings to 1 (implicit feedback). Defaults to `True`
seed	int	No	Random seed for reproducible negative sampling. Defaults to `None`
sample_with_replacement	bool	No	If `True`, sample negatives with replacement. Defaults to `False`
print_warnings	bool	No	If `True`, print warnings when insufficient items exist for sampling without replacement. Defaults to `False`

Outputs

Name	Type	Description
dataset	Dataset	An initialized Dataset object with the following key attributes
dataset.n_users	int	Total number of unique users in the training data
dataset.n_items	int	Total number of unique items in the training data
dataset.user2id	dict	Mapping from original user IDs to contiguous integer indices
dataset.item2id	dict	Mapping from original item IDs to contiguous integer indices
dataset.id2user	dict	Reverse mapping from integer indices to original user IDs
dataset.id2item	dict	Reverse mapping from integer indices to original item IDs
dataset.train_len	int	Number of interactions in the training file

Usage Examples

Basic Usage

from recommenders.models.ncf.dataset import Dataset

# Initialize dataset with training and test files
data = Dataset(
    train_file="train.csv",
    test_file="test.csv",
    n_neg=4,
    n_neg_test=100,
    binary=True,
    seed=42,
)

print(f"Users: {data.n_users}, Items: {data.n_items}")
print(f"Training interactions: {data.train_len}")

# The dataset is now ready to be passed to NCF.fit()

With MovieLens Data

from recommenders.datasets import movielens
from recommenders.datasets.python_splitters import python_chrono_split
from recommenders.models.ncf.dataset import Dataset

# Load and split data
df = movielens.load_pandas_df(size="100k")
train, test = python_chrono_split(df, ratio=0.75)

# Save to temporary files for Dataset
train.to_csv("train.csv", index=False)
test.to_csv("test.csv", index=False)

# Create NCF dataset with negative sampling
data = Dataset(
    train_file="train.csv",
    test_file="test.csv",
    n_neg=4,
    n_neg_test=100,
    binary=True,
    seed=42,
)

Related Pages

Implements Principle

Principle:Recommenders_team_Recommenders_Negative_Sampling_For_Implicit_Feedback

Requires Environment

Environment:Recommenders_team_Recommenders_Python_Core_Dependencies

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment