Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Online ml River Compose Select

From Leeroopedia


Knowledge Sources
Domains Online_Learning, Feature_Selection, Data_Transformation, Pipeline
Last Updated 2026-02-08 16:00 GMT

Overview

Feature selection transformers that filter features by name or type, supporting both inclusion and exclusion patterns.

Description

This module provides three transformers for feature selection: Discard, Select, and SelectType. All are pure transformers that create new dictionaries rather than modifying inputs in-place.

Discard removes specified features from the input, useful for excluding unwanted features from downstream processing. It accepts any number of feature names and filters them out, keeping all others.

Select does the opposite - it keeps only the specified features and removes all others. This is the most commonly used selector and supports both single-sample and mini-batch processing for integration with various pipeline types.

SelectType filters features based on their Python type using isinstance checks. This enables type-based routing, such as applying different preprocessing to numeric versus categorical features. It's particularly useful for heterogeneous datasets with mixed feature types.

Usage

Use Discard to remove unwanted features or sensitive data. Use Select to extract specific features for processing or to apply transformations to feature subsets. Use SelectType to route different feature types through different preprocessing pipelines, a common pattern when dealing with mixed data types.

Code Reference

Source Location

Signature

class Discard(base.Transformer):
    def __init__(self, *keys: base.typing.FeatureName):
        ...

class Select(base.MiniBatchTransformer):
    def __init__(self, *keys: base.typing.FeatureName):
        ...

class SelectType(base.Transformer):
    def __init__(self, *types: type):
        ...

Import

from river import compose

I/O Contract

Input (Discard/Select)
Parameter Type Description
keys variable args Feature names to discard or select
x dict Feature dictionary for single-sample
X DataFrame Feature dataframe for mini-batch (Select only)
Input (SelectType)
Parameter Type Description
types variable args of type Python types to filter by
x dict Feature dictionary to filter
Output
Method Return Type Description
transform_one(x) dict Filtered feature dictionary
transform_many(X) DataFrame Filtered dataframe (Select only)
Key Methods
Method Parameters Description
transform_one(x) x: dict Filters features for single sample
transform_many(X) X: DataFrame Filters features for batch (Select only)

Usage Examples

from river import compose

# Discard: Remove unwanted features
x = {'a': 42, 'b': 12, 'c': 13}
print(compose.Discard('a', 'b').transform_one(x))
# {'c': 13}

# Use in pipeline to remove features before processing
from river import feature_extraction as fx

x = {'sales': 10, 'shop': 'Ikea', 'country': 'Sweden'}

pipeline = (
    compose.Discard('shop', 'country') |
    fx.PolynomialExtender()
)
print(pipeline.transform_one(x))
# {'sales': 10, 'sales*sales': 100}

# Select: Keep only specific features
x = {'a': 42, 'b': 12, 'c': 13}
print(compose.Select('c').transform_one(x))
# {'c': 13}

# Select with pipeline
x = {'sales': 10, 'shop': 'Ikea', 'country': 'Sweden'}

pipeline = (
    compose.Select('sales') |
    fx.PolynomialExtender()
)
print(pipeline.transform_one(x))
# {'sales': 10, 'sales*sales': 100}

# Select with mini-batch processing
import pandas as pd

X = pd.DataFrame([
    {'x_1': 10.5, 'x_2': 8.1, 'x_3': 5.2},
    {'x_1': 9.1, 'x_2': 8.9, 'x_3': 6.3},
    {'x_1': 10.9, 'x_2': 10.7, 'x_3': 7.1},
])

selector = compose.Select('x_1', 'x_2')
print(selector.transform_many(X))
#     x_1   x_2
# 0  10.5   8.1
# 1   9.1   8.9
# 2  10.9  10.7

# SelectType: Filter by Python type
import numbers
from river import preprocessing
from river import linear_model

x = {'age': 25, 'name': 'Alice', 'salary': 50000, 'city': 'NYC'}

# Apply different preprocessing to different types
num = compose.SelectType(numbers.Number) | preprocessing.StandardScaler()
cat = compose.SelectType(str) | preprocessing.OneHotEncoder()

# Combine both pipelines
model = (num + cat) | linear_model.LogisticRegression()

# SelectType filters in transform
selector = compose.SelectType(numbers.Number)
print(selector.transform_one(x))
# {'age': 25, 'salary': 50000}

selector = compose.SelectType(str)
print(selector.transform_one(x))
# {'name': 'Alice', 'city': 'NYC'}

# Example: Complex feature routing
data = {
    'age': 30,
    'income': 75000,
    'name': 'Bob',
    'city': 'LA',
    'score': 0.85
}

# Route numeric vs categorical features differently
numeric_pipeline = (
    compose.SelectType(int, float) |
    preprocessing.StandardScaler() |
    compose.Prefixer('num_')
)

categorical_pipeline = (
    compose.SelectType(str) |
    preprocessing.OneHotEncoder() |
    compose.Prefixer('cat_')
)

combined = numeric_pipeline + categorical_pipeline

# Example: Select for feature engineering
from river import stats

# Select subset for aggregation
user_features = compose.Select('user_id', 'user_age', 'user_gender')
item_features = compose.Select('item_id', 'item_price', 'item_category')

# Create interactions between selected subsets
interactions = user_features * item_features

# Example: Discard sensitive features
sensitive_features = ['ssn', 'credit_card', 'password']
anonymizer = compose.Discard(*sensitive_features)

# Use before model training
pipeline = (
    anonymizer |
    preprocessing.StandardScaler() |
    linear_model.LogisticRegression()
)

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment