Implementation:Online ml River Stats MAD
| Knowledge Sources | |
|---|---|
| Domains | Online_Learning, Statistics |
| Last Updated | 2026-02-08 16:00 GMT |
Overview
MAD computes the Median Absolute Deviation of a data stream incrementally.
Description
This statistic calculates the median of the absolute differences between each data point and the overall median of the data. It is a robust measure of statistical dispersion that is less sensitive to outliers than standard deviation. The implementation updates both the median of the data and the median of the absolute deviations online, which means it approximates the batch MAD rather than computing it exactly.
Usage
Use MAD when you need a robust measure of variability that is resistant to outliers. This is particularly useful in anomaly detection, outlier identification, and data quality assessment where extreme values should not overly influence the measure of spread. MAD is often preferred over standard deviation when working with skewed or heavy-tailed distributions.
Code Reference
Source Location
- Repository: Online_ml_River
- File: river/stats/mad.py
Signature
class MAD(quantile.Quantile):
def __init__(self):
super().__init__(q=0.5)
self.median = quantile.Quantile(q=0.5)
Import
from river import stats
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| x | numbers.Number | Yes | Value to update the statistic with |
Outputs
| Name | Type | Description |
|---|---|---|
| get() | float | Current median absolute deviation |
Usage Examples
from river import stats
# Create MAD statistic
mad = stats.MAD()
X = [4, 2, 5, 3, 0, 4]
for x in X:
mad.update(x)
print(f"Value: {x}, MAD: {mad.get()}")
# Output:
# Value: 4, MAD: 0.0
# Value: 2, MAD: 2.0
# Value: 5, MAD: 1.0
# Value: 3, MAD: 1.0
# Value: 0, MAD: 1.0
# Value: 4, MAD: 1.0
# Comparing MAD with standard deviation for robustness
import numpy as np
# Dataset with outlier
data_with_outlier = [1, 2, 3, 4, 5, 100]
mad_robust = stats.MAD()
from river import stats as rstats
std = rstats.Var()
for x in data_with_outlier:
mad_robust.update(x)
std.update(x)
print(f"MAD: {mad_robust.get():.2f}")
print(f"Std Dev: {np.sqrt(std.get()):.2f}")
# MAD is much less affected by the outlier (100)