Implementation:Online ml River Preprocessing OneHotEncoder

Knowledge Sources	Online_ml_River
Domains	Online_Learning, Preprocessing, Categorical_Encoding
Last Updated	2026-02-08 16:00 GMT

Overview

Online one-hot encoding for categorical features with support for both streaming and mini-batch processing.

Description

OneHotEncoder converts categorical variables into binary indicator features. Each unique value encountered becomes a separate binary feature with value 1 when present, 0 otherwise. The encoder learns the vocabulary incrementally as new categories are observed. It supports encoding lists or sets within a single feature, drop_zeros mode to exclude absent features, and drop_first to create k-1 dummies from k categories (useful for avoiding multicollinearity). Works with both single observations and pandas DataFrames for batch processing.

Usage

Use this to prepare categorical features for linear models or algorithms that require numeric inputs. Essential for handling string-valued features in machine learning pipelines. The drop_zeros parameter reduces memory when dealing with many categories. The drop_first parameter helps prevent perfect collinearity in statistical models. Combine with compose.Select to encode specific features while preserving others.

Code Reference

Source Location

Repository: Online_ml_River
File: river/preprocessing/one_hot.py

Signature

class OneHotEncoder(base.MiniBatchTransformer):
    def __init__(self, drop_zeros=False, drop_first=False)

Import

from river import preprocessing

I/O Contract

Input	Output
Dict[str, Union[str, List, Set]] - Categorical features	Dict[str, int] - Binary indicator features

Usage Examples

from pprint import pprint
import random
import string

random.seed(42)
alphabet = list(string.ascii_lowercase)
X = [
    {
        'c1': random.choice(alphabet),
        'c2': random.choice(alphabet),
    }
    for _ in range(4)
]

from river import preprocessing

oh = preprocessing.OneHotEncoder()
for x in X[:2]:
    oh.learn_one(x)
    pprint(oh.transform_one(x))
# {'c1_u': 1, 'c2_d': 1}
# {'c1_a': 1, 'c1_u': 0, 'c2_d': 0, 'c2_x': 1}

# With drop_zeros enabled
oh = preprocessing.OneHotEncoder(drop_zeros=True)
for x in X:
    oh.learn_one(x)
    pprint(oh.transform_one(x))
# {'c1_u': 1, 'c2_d': 1}
# {'c1_a': 1, 'c2_x': 1}
# {'c1_i': 1, 'c2_h': 1}
# {'c1_h': 1, 'c2_e': 1}

# Using compose.Select to encode specific features
from river import compose

pp = compose.Select('c1') | preprocessing.OneHotEncoder()
pp += compose.Select('c2')

for x in X:
    pp.learn_one(x)
    pprint(pp.transform_one(x))
# {'c1_u': 1, 'c2': 'd'}
# {'c1_a': 1, 'c1_u': 0, 'c2': 'x'}
# {'c1_a': 0, 'c1_i': 1, 'c1_u': 0, 'c2': 'h'}
# {'c1_a': 0, 'c1_h': 1, 'c1_i': 0, 'c1_u': 0, 'c2': 'e'}

Related Pages

Environment:Online_ml_River_Python_Runtime_Environment

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment