Implementation:Online ml River Compose Select

Knowledge Sources	Online_ml_River
Domains	Online_Learning, Feature_Selection, Data_Transformation, Pipeline
Last Updated	2026-02-08 16:00 GMT

Overview

Feature selection transformers that filter features by name or type, supporting both inclusion and exclusion patterns.

Description

This module provides three transformers for feature selection: Discard, Select, and SelectType. All are pure transformers that create new dictionaries rather than modifying inputs in-place.

Discard removes specified features from the input, useful for excluding unwanted features from downstream processing. It accepts any number of feature names and filters them out, keeping all others.

Select does the opposite - it keeps only the specified features and removes all others. This is the most commonly used selector and supports both single-sample and mini-batch processing for integration with various pipeline types.

SelectType filters features based on their Python type using isinstance checks. This enables type-based routing, such as applying different preprocessing to numeric versus categorical features. It's particularly useful for heterogeneous datasets with mixed feature types.

Usage

Use Discard to remove unwanted features or sensitive data. Use Select to extract specific features for processing or to apply transformations to feature subsets. Use SelectType to route different feature types through different preprocessing pipelines, a common pattern when dealing with mixed data types.

Code Reference

Source Location

Repository: Online_ml_River
File: river/compose/select.py

Signature

class Discard(base.Transformer):
    def __init__(self, *keys: base.typing.FeatureName):
        ...

class Select(base.MiniBatchTransformer):
    def __init__(self, *keys: base.typing.FeatureName):
        ...

class SelectType(base.Transformer):
    def __init__(self, *types: type):
        ...

Import

from river import compose

I/O Contract

Input (Discard/Select)
Parameter	Type	Description
keys	variable args	Feature names to discard or select
x	dict	Feature dictionary for single-sample
X	DataFrame	Feature dataframe for mini-batch (Select only)

Input (SelectType)
Parameter	Type	Description
types	variable args of type	Python types to filter by
x	dict	Feature dictionary to filter

Output
Method	Return Type	Description
transform_one(x)	dict	Filtered feature dictionary
transform_many(X)	DataFrame	Filtered dataframe (Select only)

Key Methods
Method	Parameters	Description
transform_one(x)	x: dict	Filters features for single sample
transform_many(X)	X: DataFrame	Filters features for batch (Select only)

Usage Examples

from river import compose

# Discard: Remove unwanted features
x = {'a': 42, 'b': 12, 'c': 13}
print(compose.Discard('a', 'b').transform_one(x))
# {'c': 13}

# Use in pipeline to remove features before processing
from river import feature_extraction as fx

x = {'sales': 10, 'shop': 'Ikea', 'country': 'Sweden'}

pipeline = (
    compose.Discard('shop', 'country') |
    fx.PolynomialExtender()
)
print(pipeline.transform_one(x))
# {'sales': 10, 'sales*sales': 100}

# Select: Keep only specific features
x = {'a': 42, 'b': 12, 'c': 13}
print(compose.Select('c').transform_one(x))
# {'c': 13}

# Select with pipeline
x = {'sales': 10, 'shop': 'Ikea', 'country': 'Sweden'}

pipeline = (
    compose.Select('sales') |
    fx.PolynomialExtender()
)
print(pipeline.transform_one(x))
# {'sales': 10, 'sales*sales': 100}

# Select with mini-batch processing
import pandas as pd

X = pd.DataFrame([
    {'x_1': 10.5, 'x_2': 8.1, 'x_3': 5.2},
    {'x_1': 9.1, 'x_2': 8.9, 'x_3': 6.3},
    {'x_1': 10.9, 'x_2': 10.7, 'x_3': 7.1},
])

selector = compose.Select('x_1', 'x_2')
print(selector.transform_many(X))
#     x_1   x_2
# 0  10.5   8.1
# 1   9.1   8.9
# 2  10.9  10.7

# SelectType: Filter by Python type
import numbers
from river import preprocessing
from river import linear_model

x = {'age': 25, 'name': 'Alice', 'salary': 50000, 'city': 'NYC'}

# Apply different preprocessing to different types
num = compose.SelectType(numbers.Number) | preprocessing.StandardScaler()
cat = compose.SelectType(str) | preprocessing.OneHotEncoder()

# Combine both pipelines
model = (num + cat) | linear_model.LogisticRegression()

# SelectType filters in transform
selector = compose.SelectType(numbers.Number)
print(selector.transform_one(x))
# {'age': 25, 'salary': 50000}

selector = compose.SelectType(str)
print(selector.transform_one(x))
# {'name': 'Alice', 'city': 'NYC'}

# Example: Complex feature routing
data = {
    'age': 30,
    'income': 75000,
    'name': 'Bob',
    'city': 'LA',
    'score': 0.85
}

# Route numeric vs categorical features differently
numeric_pipeline = (
    compose.SelectType(int, float) |
    preprocessing.StandardScaler() |
    compose.Prefixer('num_')
)

categorical_pipeline = (
    compose.SelectType(str) |
    preprocessing.OneHotEncoder() |
    compose.Prefixer('cat_')
)

combined = numeric_pipeline + categorical_pipeline

# Example: Select for feature engineering
from river import stats

# Select subset for aggregation
user_features = compose.Select('user_id', 'user_age', 'user_gender')
item_features = compose.Select('item_id', 'item_price', 'item_category')

# Create interactions between selected subsets
interactions = user_features * item_features

# Example: Discard sensitive features
sensitive_features = ['ssn', 'credit_card', 'password']
anonymizer = compose.Discard(*sensitive_features)

# Use before model training
pipeline = (
    anonymizer |
    preprocessing.StandardScaler() |
    linear_model.LogisticRegression()
)

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment