Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:Recommenders team Recommenders NCF Dataset Init

From Leeroopedia


Knowledge Sources
Domains Recommender Systems, Implicit Feedback, Data Preparation
Last Updated 2026-02-10 00:00 GMT

Overview

Concrete tool for preparing implicit feedback data with negative sampling for Neural Collaborative Filtering provided by the recommenders library.

Description

Dataset.__init__ initializes the NCF dataset by loading training (and optionally test) interaction files, building user/item ID mappings, and configuring negative sampling parameters. When a test file is provided, it automatically generates a full test file that includes n_neg_test negative samples per positive test interaction, enabling the leave-one-out evaluation protocol. The class also provides a train_loader method that yields batches of (user, item, label) tuples with n_neg negatives sampled per positive example during training.

The binary flag converts any non-zero rating to 1, which is the standard treatment for implicit feedback. User and item IDs are mapped to contiguous integer indices via user2id and item2id dictionaries, which are later consumed by the NCF model's embedding layers.

Usage

Import and instantiate Dataset after splitting your interaction data into training and test CSV files. This is the required data preparation step before calling NCF.fit(). The training file is mandatory; the test file is needed only for evaluation. Adjust n_neg to control the training negative sampling ratio and n_neg_test for evaluation.

Code Reference

Source Location

  • Repository: recommenders
  • File: recommenders/models/ncf/dataset.py
  • Lines: 304-391

Signature

class Dataset(object):
    def __init__(
        self,
        train_file,
        test_file=None,
        test_file_full=None,
        overwrite_test_file_full=False,
        n_neg=4,
        n_neg_test=100,
        col_user=DEFAULT_USER_COL,
        col_item=DEFAULT_ITEM_COL,
        col_rating=DEFAULT_RATING_COL,
        binary=True,
        seed=None,
        sample_with_replacement=False,
        print_warnings=False,
    ):

Import

from recommenders.models.ncf.dataset import Dataset

I/O Contract

Inputs

Name Type Required Description
train_file str Yes Path to the training dataset CSV file containing user-item interactions
test_file str No Path to the test dataset CSV file for leave-one-out evaluation. Defaults to None
test_file_full str No Path to the full test file including negative samples. If None and test_file is provided, auto-generated as test_file_full.csv
overwrite_test_file_full bool No If True, regenerate and overwrite the full test file even if it already exists. Defaults to False
n_neg int No Number of negative samples per positive example during training. Defaults to 4
n_neg_test int No Number of negative samples per positive example for evaluation. Defaults to 100
col_user str No Name of the user ID column. Defaults to DEFAULT_USER_COL ("userID")
col_item str No Name of the item ID column. Defaults to DEFAULT_ITEM_COL ("itemID")
col_rating str No Name of the rating column. Defaults to DEFAULT_RATING_COL ("rating")
binary bool No If True, convert all non-zero ratings to 1 (implicit feedback). Defaults to True
seed int No Random seed for reproducible negative sampling. Defaults to None
sample_with_replacement bool No If True, sample negatives with replacement. Defaults to False
print_warnings bool No If True, print warnings when insufficient items exist for sampling without replacement. Defaults to False

Outputs

Name Type Description
dataset Dataset An initialized Dataset object with the following key attributes
dataset.n_users int Total number of unique users in the training data
dataset.n_items int Total number of unique items in the training data
dataset.user2id dict Mapping from original user IDs to contiguous integer indices
dataset.item2id dict Mapping from original item IDs to contiguous integer indices
dataset.id2user dict Reverse mapping from integer indices to original user IDs
dataset.id2item dict Reverse mapping from integer indices to original item IDs
dataset.train_len int Number of interactions in the training file

Usage Examples

Basic Usage

from recommenders.models.ncf.dataset import Dataset

# Initialize dataset with training and test files
data = Dataset(
    train_file="train.csv",
    test_file="test.csv",
    n_neg=4,
    n_neg_test=100,
    binary=True,
    seed=42,
)

print(f"Users: {data.n_users}, Items: {data.n_items}")
print(f"Training interactions: {data.train_len}")

# The dataset is now ready to be passed to NCF.fit()

With MovieLens Data

from recommenders.datasets import movielens
from recommenders.datasets.python_splitters import python_chrono_split
from recommenders.models.ncf.dataset import Dataset

# Load and split data
df = movielens.load_pandas_df(size="100k")
train, test = python_chrono_split(df, ratio=0.75)

# Save to temporary files for Dataset
train.to_csv("train.csv", index=False)
test.to_csv("test.csv", index=False)

# Create NCF dataset with negative sampling
data = Dataset(
    train_file="train.csv",
    test_file="test.csv",
    n_neg=4,
    n_neg_test=100,
    binary=True,
    seed=42,
)

Related Pages

Implements Principle

Requires Environment

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment