Implementation:Online ml River Datasets MovieLens100K
| Knowledge Sources | |
|---|---|
| Domains | Online_Learning, Datasets, Regression, Recommender_Systems |
| Last Updated | 2026-02-08 16:00 GMT |
Overview
Concrete dataset for regression and recommender systems provided by the River library.
Description
MovieLens 100K dataset. MovieLens datasets were collected by the GroupLens Research Project at the University of Minnesota. This dataset consists of 100,000 ratings (1-5) from 943 users on 1682 movies. Each user has rated at least 20 movies. User and movie information are provided. The data was collected through the MovieLens web site (movielens.umn.edu) during the seven-month period from September 19th, 1997 through April 22nd, 1998.
This dataset contains 100,000 samples with 10 features for regression tasks (rating prediction).
Usage
This dataset is useful for:
- Collaborative filtering and recommender systems
- Rating prediction tasks
- Personalization algorithms
- Matrix factorization techniques
Code Reference
Source Location
- Repository: Online_ml_River
- File: river/datasets/movielens100k.py
Signature
class MovieLens100K(base.RemoteDataset):
def __init__(self, unpack_user_and_item=False):
super().__init__(
n_samples=100_000,
n_features=10,
task=base.REG,
url="https://maxhalford.github.io/files/datasets/ml_100k.zip",
size=11_057_876,
filename="ml_100k.csv",
)
self.unpack_user_and_item = unpack_user_and_item
def _iter(self):
X_y = stream.iter_csv(
self.path,
target="rating",
converters={
"timestamp": int,
"release_date": int,
"age": float,
"rating": float,
},
delimiter="\t",
)
if self.unpack_user_and_item:
for x, y in X_y:
user = x.pop("user")
item = x.pop("item")
yield x, y, {"user": user, "item": item}
else:
yield from X_y
Import
from river import datasets
dataset = datasets.MovieLens100K()
# Or with unpacked user/item:
dataset = datasets.MovieLens100K(unpack_user_and_item=True)
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| unpack_user_and_item | bool | No | Whether to extract user and item as extra kwargs (default: False) |
Outputs
| Name | Type | Description |
|---|---|---|
| iter() (default) | tuple(dict, float) | Yields (features_dict, rating) pairs |
| iter() (unpacked) | tuple(dict, float, dict) | Yields (features_dict, rating, {"user": user, "item": item}) |
Dataset Properties
| Property | Value |
|---|---|
| Number of samples | 100,000 |
| Number of features | 10 |
| Task | Regression (rating prediction) |
| Format | CSV (tab-delimited) |
| Size | 11,057,876 bytes |
| Number of users | 943 |
| Number of items | 1,682 |
| Rating scale | 1-5 |
Features
The dataset includes features about:
- User information (user ID, age, demographics)
- Movie information (movie ID, release date, genre)
- Interaction data (timestamp)
- rating: User rating of the movie (target variable, float 1-5)
Usage Examples
from river import datasets
# Standard usage
dataset = datasets.MovieLens100K()
for x, y in dataset:
print(x, y)
break
# With user and item unpacked
dataset = datasets.MovieLens100K(unpack_user_and_item=True)
for x, y, extra in dataset:
print(f"Features: {x}")
print(f"Rating: {y}")
print(f"User: {extra['user']}, Item: {extra['item']}")
break
References
- Harper, F.M. and Konstan, J.A., 2015. The MovieLens Datasets: History and Context. ACM Transactions on Interactive Intelligent Systems (TiiS), 5(4), pp.1-19. [1]