Implementation:Online ml River Datasets Keystroke
| Knowledge Sources | |
|---|---|
| Domains | Online_Learning, Datasets, Multi_Class_Classification, Biometrics |
| Last Updated | 2026-02-08 16:00 GMT |
Overview
Concrete dataset for multi-class classification provided by the River library.
Description
CMU keystroke dataset. Users are tasked to type in a password. The task is to determine which user is typing in the password based on their keystroke dynamics.
The only difference with the original dataset is that the "sessionIndex" and "rep" attributes have been dropped.
This dataset contains 20,400 samples with 31 features across 51 classes for multi-class classification tasks.
Usage
This dataset is useful for:
- Biometric authentication and user identification
- Behavioral biometrics research
- Multi-class classification with moderate number of classes
- Keystroke dynamics analysis
Code Reference
Source Location
- Repository: Online_ml_River
- File: river/datasets/keystroke.py
Signature
class Keystroke(base.RemoteDataset):
def __init__(self):
super().__init__(
n_samples=20_400,
n_features=31,
n_classes=51,
task=base.MULTI_CLF,
url="http://www.cs.cmu.edu/~keystroke/DSL-StrongPasswordData.csv",
size=4_669_935,
filename="DSL-StrongPasswordData.csv",
unpack=False,
)
def _iter(self):
converters = {
"H.period": float,
"DD.period.t": float,
"UD.period.t": float,
# ... (31 keystroke timing features)
}
return stream.iter_csv(
self.path,
target="subject",
converters=converters,
drop=["sessionIndex", "rep"],
)
Import
from river import datasets
dataset = datasets.Keystroke()
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| (none) | — | — | No parameters needed |
Outputs
| Name | Type | Description |
|---|---|---|
| iter() | tuple(dict, str) | Yields (features_dict, target) pairs where target is the subject/user ID |
Dataset Properties
| Property | Value |
|---|---|
| Number of samples | 20,400 |
| Number of features | 31 |
| Number of classes | 51 |
| Task | Multi-class classification |
| Format | CSV |
| Size | 4,669,935 bytes |
Features
The dataset includes 31 keystroke timing features representing:
- Hold time (H): Duration a key is pressed
- Down-Down time (DD): Time between pressing two consecutive keys
- Up-Down time (UD): Time between releasing one key and pressing the next
Features capture typing dynamics for the password ".tie5Roanl" including:
- Individual key hold times (H.period, H.t, H.i, etc.)
- Transition times between keys (DD.period.t, UD.period.t, etc.)
Usage Examples
from river import datasets
dataset = datasets.Keystroke()
for x, y in dataset:
print(x, y)
break