Implementation:Online ml River Preprocessing Imputers
| Knowledge Sources | |
|---|---|
| Domains | Online_Learning, Preprocessing, Missing_Data |
| Last Updated | 2026-02-08 16:00 GMT |
Overview
Streaming imputation methods for handling missing values using previous values or running statistics.
Description
This module provides two complementary imputation strategies for online learning. PreviousImputer replaces missing values with the most recently observed non-null value for each feature. StatImputer uses running statistics (like Mean, Mode, or custom values) to fill missing data. StatImputer can be configured with different statistics per feature and supports both numeric and categorical data. Both imputers learn incrementally as new data arrives and maintain their state across observations.
Usage
Use PreviousImputer for time-series or ordered data where the last known value is a reasonable estimate. Use StatImputer when statistical aggregates (mean for numeric, mode for categorical) provide better estimates. StatImputer is particularly useful when combined with Bayesian statistics to reduce overfitting. Both are essential preprocessing steps when working with real-world data containing missing values in streaming contexts.
Code Reference
Source Location
- Repository: Online_ml_River
- File: river/preprocessing/impute.py
Signature
class PreviousImputer(base.Transformer):
def __init__(self)
class StatImputer(base.Transformer):
def __init__(self, *imputers)
Import
from river import preprocessing
from river import stats
I/O Contract
| Input | Output |
|---|---|
| Dict[str, Any] - Features with possible None values | Dict[str, Any] - Features with imputed values |
Usage Examples
from river import preprocessing
from river import stats
# PreviousImputer example
imputer = preprocessing.PreviousImputer()
imputer.learn_one({'x': 1, 'y': 2})
print(imputer.transform_one({'y': None}))
# {'y': 2}
# StatImputer with numeric data
X = [
{'temperature': 1},
{'temperature': 8},
{'temperature': 3},
{'temperature': None},
{'temperature': 4}
]
imp = preprocessing.StatImputer(('temperature', stats.Mean()))
for x in X:
imp.learn_one(x)
print(imp.transform_one(x))
# {'temperature': 1}
# {'temperature': 8}
# {'temperature': 3}
# {'temperature': 4.0}
# {'temperature': 4}
# StatImputer with categorical data
X = [
{'weather': 'sunny'},
{'weather': 'rainy'},
{'weather': 'sunny'},
{'weather': None},
]
imp = preprocessing.StatImputer(('weather', stats.Mode()))
for x in X:
imp.learn_one(x)
print(imp.transform_one(x))
# {'weather': 'sunny'}
# {'weather': 'rainy'}
# {'weather': 'sunny'}
# {'weather': 'sunny'}