Implementation:Online ml River Preprocessing Imputers

Knowledge Sources	Online_ml_River
Domains	Online_Learning, Preprocessing, Missing_Data
Last Updated	2026-02-08 16:00 GMT

Overview

Streaming imputation methods for handling missing values using previous values or running statistics.

Description

This module provides two complementary imputation strategies for online learning. PreviousImputer replaces missing values with the most recently observed non-null value for each feature. StatImputer uses running statistics (like Mean, Mode, or custom values) to fill missing data. StatImputer can be configured with different statistics per feature and supports both numeric and categorical data. Both imputers learn incrementally as new data arrives and maintain their state across observations.

Usage

Use PreviousImputer for time-series or ordered data where the last known value is a reasonable estimate. Use StatImputer when statistical aggregates (mean for numeric, mode for categorical) provide better estimates. StatImputer is particularly useful when combined with Bayesian statistics to reduce overfitting. Both are essential preprocessing steps when working with real-world data containing missing values in streaming contexts.

Code Reference

Source Location

Repository: Online_ml_River
File: river/preprocessing/impute.py

Signature

class PreviousImputer(base.Transformer):
    def __init__(self)

class StatImputer(base.Transformer):
    def __init__(self, *imputers)

Import

from river import preprocessing
from river import stats

I/O Contract

Input	Output
Dict[str, Any] - Features with possible None values	Dict[str, Any] - Features with imputed values

Usage Examples

from river import preprocessing
from river import stats

# PreviousImputer example
imputer = preprocessing.PreviousImputer()
imputer.learn_one({'x': 1, 'y': 2})
print(imputer.transform_one({'y': None}))
# {'y': 2}

# StatImputer with numeric data
X = [
    {'temperature': 1},
    {'temperature': 8},
    {'temperature': 3},
    {'temperature': None},
    {'temperature': 4}
]

imp = preprocessing.StatImputer(('temperature', stats.Mean()))

for x in X:
    imp.learn_one(x)
    print(imp.transform_one(x))
# {'temperature': 1}
# {'temperature': 8}
# {'temperature': 3}
# {'temperature': 4.0}
# {'temperature': 4}

# StatImputer with categorical data
X = [
    {'weather': 'sunny'},
    {'weather': 'rainy'},
    {'weather': 'sunny'},
    {'weather': None},
]

imp = preprocessing.StatImputer(('weather', stats.Mode()))

for x in X:
    imp.learn_one(x)
    print(imp.transform_one(x))
# {'weather': 'sunny'}
# {'weather': 'rainy'}
# {'weather': 'sunny'}
# {'weather': 'sunny'}

Related Pages

Environment:Online_ml_River_Python_Runtime_Environment

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment