Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Online ml River FeatureSelection PoissonInclusion

From Leeroopedia


Knowledge Sources
Domains Online_Learning, Feature_Selection, Sparse_Features
Last Updated 2026-02-08 16:00 GMT

Overview

Randomly selects features for inclusion using Poisson trials for memory-efficient sparse feature handling.

Description

PoissonInclusion performs probabilistic feature selection by including each new feature with probability p when first encountered. Once a feature is included, it remains in the selected set for all future observations. The number of encounters before inclusion follows a geometric distribution with expected value 1/p. This approach provides a simple, memory-efficient way to handle extremely high-dimensional sparse feature spaces without exhaustive feature evaluation.

Usage

Use this when working with extremely large or unbounded sparse feature spaces, such as hashed features, text n-grams, or user-item interactions. Particularly useful for online advertising, recommendation systems, and large-scale text classification where storing all features is infeasible. The probability p controls the memory-accuracy trade-off: lower values use less memory but may miss important features initially. Recommended for scenarios with millions of potential features.

Code Reference

Source Location

Signature

class PoissonInclusion(base.Transformer):
    def __init__(self, p: float, seed: int | None = None)

Import

from river import feature_selection

I/O Contract

Input Output
Dict[str, float] - All features Dict[str, float] - Randomly included features

Usage Examples

from river import datasets
from river import feature_selection
from river import stream

selector = feature_selection.PoissonInclusion(p=0.1, seed=42)

dataset = iter(datasets.TrumpApproval())

feature_names = next(dataset)[0].keys()
n = 0

# Count how many observations until all features are included
while True:
    x, y = next(dataset)
    xt = selector.transform_one(x)
    if xt.keys() == feature_names:
        break
    n += 1

print(n)
# 12

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment