Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Online ml River Preprocessing Imputers

From Leeroopedia
Revision as of 16:10, 16 February 2026 by Admin (talk | contribs) (Auto-imported from implementations/Online_ml_River_Preprocessing_Imputers.md)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)


Knowledge Sources
Domains Online_Learning, Preprocessing, Missing_Data
Last Updated 2026-02-08 16:00 GMT

Overview

Streaming imputation methods for handling missing values using previous values or running statistics.

Description

This module provides two complementary imputation strategies for online learning. PreviousImputer replaces missing values with the most recently observed non-null value for each feature. StatImputer uses running statistics (like Mean, Mode, or custom values) to fill missing data. StatImputer can be configured with different statistics per feature and supports both numeric and categorical data. Both imputers learn incrementally as new data arrives and maintain their state across observations.

Usage

Use PreviousImputer for time-series or ordered data where the last known value is a reasonable estimate. Use StatImputer when statistical aggregates (mean for numeric, mode for categorical) provide better estimates. StatImputer is particularly useful when combined with Bayesian statistics to reduce overfitting. Both are essential preprocessing steps when working with real-world data containing missing values in streaming contexts.

Code Reference

Source Location

Signature

class PreviousImputer(base.Transformer):
    def __init__(self)

class StatImputer(base.Transformer):
    def __init__(self, *imputers)

Import

from river import preprocessing
from river import stats

I/O Contract

Input Output
Dict[str, Any] - Features with possible None values Dict[str, Any] - Features with imputed values

Usage Examples

from river import preprocessing
from river import stats

# PreviousImputer example
imputer = preprocessing.PreviousImputer()
imputer.learn_one({'x': 1, 'y': 2})
print(imputer.transform_one({'y': None}))
# {'y': 2}

# StatImputer with numeric data
X = [
    {'temperature': 1},
    {'temperature': 8},
    {'temperature': 3},
    {'temperature': None},
    {'temperature': 4}
]

imp = preprocessing.StatImputer(('temperature', stats.Mean()))

for x in X:
    imp.learn_one(x)
    print(imp.transform_one(x))
# {'temperature': 1}
# {'temperature': 8}
# {'temperature': 3}
# {'temperature': 4.0}
# {'temperature': 4}

# StatImputer with categorical data
X = [
    {'weather': 'sunny'},
    {'weather': 'rainy'},
    {'weather': 'sunny'},
    {'weather': None},
]

imp = preprocessing.StatImputer(('weather', stats.Mode()))

for x in X:
    imp.learn_one(x)
    print(imp.transform_one(x))
# {'weather': 'sunny'}
# {'weather': 'rainy'}
# {'weather': 'sunny'}
# {'weather': 'sunny'}

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment