Implementation:Online ml River Stats Mode
| Knowledge Sources | |
|---|---|
| Domains | Online_Learning, Statistics |
| Last Updated | 2026-02-08 16:00 GMT |
Overview
Mode tracks the most frequently occurring value in a data stream.
Description
This statistic identifies the most common value (mode) in streaming data by maintaining a counter of observed values. It supports both exact mode computation and approximate mode by limiting the number of unique values tracked (controlled by parameter k). The implementation includes Mode for overall mode and RollingMode for mode within a sliding window, both using efficient counter-based tracking.
Usage
Use Mode when you need to identify the most common value in categorical or discrete streaming data. Common applications include finding the most frequent category in classification problems, identifying typical values in categorical features, detecting prevalent patterns, and summarizing categorical distributions where the median or mean are not applicable.
Code Reference
Source Location
- Repository: Online_ml_River
- File: river/stats/mode.py
Signature
class Mode(stats.base.Univariate):
def __init__(self, k=25):
self.k = k
self.counts = collections.defaultdict(int)
class RollingMode(stats.base.RollingUnivariate):
def __init__(self, window_size: int):
self.window: collections.deque[numbers.Number] = collections.deque(maxlen=window_size)
self.counts: collections.defaultdict[typing.Any, int] = collections.defaultdict(int)
Import
from river import stats
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| x | Any (hashable) | Yes | Value to update the statistic with |
| k | int | Yes (init) | Number of unique values to track (-1 for exact mode, default: 25) |
| window_size | int | Yes (for Rolling) | Size of the rolling window |
Outputs
| Name | Type | Description |
|---|---|---|
| get() | Any | Most frequently occurring value (or None if no values seen) |
Usage Examples
from river import stats
# Mode with limited unique values
X = ['sunny', 'cloudy', 'cloudy', 'rainy', 'rainy', 'rainy']
mode = stats.Mode(k=2)
for x in X:
mode.update(x)
print(f"Value: {x}, Mode: {mode.get()}")
# Output:
# Value: sunny, Mode: sunny
# Value: cloudy, Mode: sunny
# Value: cloudy, Mode: cloudy
# Value: rainy, Mode: cloudy (rainy not tracked due to k=2)
# Value: rainy, Mode: cloudy
# Value: rainy, Mode: cloudy
# Exact mode computation
mode_exact = stats.Mode(k=-1)
for x in X:
mode_exact.update(x)
print(f"Value: {x}, Exact Mode: {mode_exact.get()}")
# Output:
# Value: sunny, Mode: sunny
# Value: cloudy, Mode: sunny
# Value: cloudy, Mode: cloudy
# Value: rainy, Mode: cloudy
# Value: rainy, Mode: cloudy
# Value: rainy, Mode: rainy
# Rolling mode
X = ['sunny', 'sunny', 'sunny', 'rainy', 'rainy', 'rainy', 'rainy']
rolling_mode = stats.RollingMode(window_size=2)
for x in X:
rolling_mode.update(x)
print(f"Value: {x}, Rolling Mode: {rolling_mode.get()}")
# Output:
# Value: sunny, Rolling Mode: sunny
# Value: sunny, Rolling Mode: sunny
# Value: sunny, Rolling Mode: sunny
# Value: rainy, Rolling Mode: sunny
# Value: rainy, Rolling Mode: rainy
# Value: rainy, Rolling Mode: rainy
# Value: rainy, Rolling Mode: rainy
# Numeric mode
numeric_mode = stats.Mode()
for x in [1, 2, 2, 3, 3, 3]:
numeric_mode.update(x)
print(f"Most frequent number: {numeric_mode.get()}")
# Output: 3