Implementation:Online ml River FeatureSelection PoissonInclusion
| Knowledge Sources | |
|---|---|
| Domains | Online_Learning, Feature_Selection, Sparse_Features |
| Last Updated | 2026-02-08 16:00 GMT |
Overview
Randomly selects features for inclusion using Poisson trials for memory-efficient sparse feature handling.
Description
PoissonInclusion performs probabilistic feature selection by including each new feature with probability p when first encountered. Once a feature is included, it remains in the selected set for all future observations. The number of encounters before inclusion follows a geometric distribution with expected value 1/p. This approach provides a simple, memory-efficient way to handle extremely high-dimensional sparse feature spaces without exhaustive feature evaluation.
Usage
Use this when working with extremely large or unbounded sparse feature spaces, such as hashed features, text n-grams, or user-item interactions. Particularly useful for online advertising, recommendation systems, and large-scale text classification where storing all features is infeasible. The probability p controls the memory-accuracy trade-off: lower values use less memory but may miss important features initially. Recommended for scenarios with millions of potential features.
Code Reference
Source Location
- Repository: Online_ml_River
- File: river/feature_selection/random.py
Signature
class PoissonInclusion(base.Transformer):
def __init__(self, p: float, seed: int | None = None)
Import
from river import feature_selection
I/O Contract
| Input | Output |
|---|---|
| Dict[str, float] - All features | Dict[str, float] - Randomly included features |
Usage Examples
from river import datasets
from river import feature_selection
from river import stream
selector = feature_selection.PoissonInclusion(p=0.1, seed=42)
dataset = iter(datasets.TrumpApproval())
feature_names = next(dataset)[0].keys()
n = 0
# Count how many observations until all features are included
while True:
x, y = next(dataset)
xt = selector.transform_one(x)
if xt.keys() == feature_names:
break
n += 1
print(n)
# 12