Implementation:Online ml River Preprocessing OrdinalEncoder
| Knowledge Sources | |
|---|---|
| Domains | Online_Learning, Preprocessing, Categorical_Encoding |
| Last Updated | 2026-02-08 16:00 GMT |
Overview
Encodes categorical features as integers in streaming fashion with configurable handling of unknown and null values.
Description
OrdinalEncoder maps each unique category within a feature to a unique integer code. Categories are assigned incrementally as they are first encountered, using auto-incrementing counters per feature. Unknown categories (not yet seen) can be mapped to a configurable value, and None values are handled separately with their own code. The encoder maintains category-to-code mappings internally and supports both single-observation and mini-batch processing through pandas DataFrames.
Usage
Use this when you need a simple integer encoding for categorical variables, particularly for tree-based models that can handle ordinal encodings directly. More memory-efficient than one-hot encoding for high-cardinality features. The unknown_value parameter allows graceful handling of new categories at inference time. Useful when the categorical order doesn't matter but you need numeric representation.
Code Reference
Source Location
- Repository: Online_ml_River
- File: river/preprocessing/ordinal.py
Signature
class OrdinalEncoder(base.MiniBatchTransformer):
def __init__(
self,
unknown_value: int | None = 0,
none_value: int = -1,
)
Import
from river import preprocessing
I/O Contract
| Input | Output |
|---|---|
| Dict[str, Any] - Categorical features | Dict[str, int] - Integer-encoded features |
Usage Examples
from river import preprocessing
X = [
{"country": "France", "place": "Taco Bell"},
{"country": None, "place": None},
{"country": "Sweden", "place": "Burger King"},
{"country": "France", "place": "Burger King"},
{"country": "Russia", "place": "Starbucks"},
{"country": "Russia", "place": "Starbucks"},
{"country": "Sweden", "place": "Taco Bell"},
{"country": None, "place": None},
]
encoder = preprocessing.OrdinalEncoder()
for x in X:
print(encoder.transform_one(x))
encoder.learn_one(x)
# {'country': 0, 'place': 0}
# {'country': -1, 'place': -1}
# {'country': 0, 'place': 0}
# {'country': 1, 'place': 2}
# {'country': 0, 'place': 0}
# {'country': 3, 'place': 3}
# {'country': 2, 'place': 1}
# {'country': -1, 'place': -1}