Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Online ml River Datasets Keystroke

From Leeroopedia


Knowledge Sources
Domains Online_Learning, Datasets, Multi_Class_Classification, Biometrics
Last Updated 2026-02-08 16:00 GMT

Overview

Concrete dataset for multi-class classification provided by the River library.

Description

CMU keystroke dataset. Users are tasked to type in a password. The task is to determine which user is typing in the password based on their keystroke dynamics.

The only difference with the original dataset is that the "sessionIndex" and "rep" attributes have been dropped.

This dataset contains 20,400 samples with 31 features across 51 classes for multi-class classification tasks.

Usage

This dataset is useful for:

  • Biometric authentication and user identification
  • Behavioral biometrics research
  • Multi-class classification with moderate number of classes
  • Keystroke dynamics analysis

Code Reference

Source Location

Signature

class Keystroke(base.RemoteDataset):
    def __init__(self):
        super().__init__(
            n_samples=20_400,
            n_features=31,
            n_classes=51,
            task=base.MULTI_CLF,
            url="http://www.cs.cmu.edu/~keystroke/DSL-StrongPasswordData.csv",
            size=4_669_935,
            filename="DSL-StrongPasswordData.csv",
            unpack=False,
        )

    def _iter(self):
        converters = {
            "H.period": float,
            "DD.period.t": float,
            "UD.period.t": float,
            # ... (31 keystroke timing features)
        }
        return stream.iter_csv(
            self.path,
            target="subject",
            converters=converters,
            drop=["sessionIndex", "rep"],
        )

Import

from river import datasets
dataset = datasets.Keystroke()

I/O Contract

Inputs

Name Type Required Description
(none) No parameters needed

Outputs

Name Type Description
iter() tuple(dict, str) Yields (features_dict, target) pairs where target is the subject/user ID

Dataset Properties

Property Value
Number of samples 20,400
Number of features 31
Number of classes 51
Task Multi-class classification
Format CSV
Size 4,669,935 bytes

Features

The dataset includes 31 keystroke timing features representing:

  • Hold time (H): Duration a key is pressed
  • Down-Down time (DD): Time between pressing two consecutive keys
  • Up-Down time (UD): Time between releasing one key and pressing the next

Features capture typing dynamics for the password ".tie5Roanl" including:

  • Individual key hold times (H.period, H.t, H.i, etc.)
  • Transition times between keys (DD.period.t, UD.period.t, etc.)

Usage Examples

from river import datasets

dataset = datasets.Keystroke()
for x, y in dataset:
    print(x, y)
    break

References

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment