Implementation:Online ml River Datasets MovieLens100K

Knowledge Sources	Online_ml_River
Domains	Online_Learning, Datasets, Regression, Recommender_Systems
Last Updated	2026-02-08 16:00 GMT

Overview

Concrete dataset for regression and recommender systems provided by the River library.

Description

MovieLens 100K dataset. MovieLens datasets were collected by the GroupLens Research Project at the University of Minnesota. This dataset consists of 100,000 ratings (1-5) from 943 users on 1682 movies. Each user has rated at least 20 movies. User and movie information are provided. The data was collected through the MovieLens web site (movielens.umn.edu) during the seven-month period from September 19th, 1997 through April 22nd, 1998.

This dataset contains 100,000 samples with 10 features for regression tasks (rating prediction).

Usage

This dataset is useful for:

Collaborative filtering and recommender systems
Rating prediction tasks
Personalization algorithms
Matrix factorization techniques

Code Reference

Source Location

Repository: Online_ml_River
File: river/datasets/movielens100k.py

Signature

class MovieLens100K(base.RemoteDataset):
    def __init__(self, unpack_user_and_item=False):
        super().__init__(
            n_samples=100_000,
            n_features=10,
            task=base.REG,
            url="https://maxhalford.github.io/files/datasets/ml_100k.zip",
            size=11_057_876,
            filename="ml_100k.csv",
        )
        self.unpack_user_and_item = unpack_user_and_item

    def _iter(self):
        X_y = stream.iter_csv(
            self.path,
            target="rating",
            converters={
                "timestamp": int,
                "release_date": int,
                "age": float,
                "rating": float,
            },
            delimiter="\t",
        )
        if self.unpack_user_and_item:
            for x, y in X_y:
                user = x.pop("user")
                item = x.pop("item")
                yield x, y, {"user": user, "item": item}
        else:
            yield from X_y

Import

from river import datasets
dataset = datasets.MovieLens100K()
# Or with unpacked user/item:
dataset = datasets.MovieLens100K(unpack_user_and_item=True)

I/O Contract

Inputs

Name	Type	Required	Description
unpack_user_and_item	bool	No	Whether to extract user and item as extra kwargs (default: False)

Outputs

Name	Type	Description
iter() (default)	tuple(dict, float)	Yields (features_dict, rating) pairs
iter() (unpacked)	tuple(dict, float, dict)	Yields (features_dict, rating, {"user": user, "item": item})

Dataset Properties

Property	Value
Number of samples	100,000
Number of features	10
Task	Regression (rating prediction)
Format	CSV (tab-delimited)
Size	11,057,876 bytes
Number of users	943
Number of items	1,682
Rating scale	1-5

Features

The dataset includes features about:

User information (user ID, age, demographics)
Movie information (movie ID, release date, genre)
Interaction data (timestamp)
rating: User rating of the movie (target variable, float 1-5)

Usage Examples

from river import datasets

# Standard usage
dataset = datasets.MovieLens100K()
for x, y in dataset:
    print(x, y)
    break

# With user and item unpacked
dataset = datasets.MovieLens100K(unpack_user_and_item=True)
for x, y, extra in dataset:
    print(f"Features: {x}")
    print(f"Rating: {y}")
    print(f"User: {extra['user']}, Item: {extra['item']}")
    break

References

Harper, F.M. and Konstan, J.A., 2015. The MovieLens Datasets: History and Context. ACM Transactions on Interactive Intelligent Systems (TiiS), 5(4), pp.1-19. [1]

Related Pages

Environment:Online_ml_River_Python_Runtime_Environment

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment