Implementation:Online ml River Compose Select
| Knowledge Sources | |
|---|---|
| Domains | Online_Learning, Feature_Selection, Data_Transformation, Pipeline |
| Last Updated | 2026-02-08 16:00 GMT |
Overview
Feature selection transformers that filter features by name or type, supporting both inclusion and exclusion patterns.
Description
This module provides three transformers for feature selection: Discard, Select, and SelectType. All are pure transformers that create new dictionaries rather than modifying inputs in-place.
Discard removes specified features from the input, useful for excluding unwanted features from downstream processing. It accepts any number of feature names and filters them out, keeping all others.
Select does the opposite - it keeps only the specified features and removes all others. This is the most commonly used selector and supports both single-sample and mini-batch processing for integration with various pipeline types.
SelectType filters features based on their Python type using isinstance checks. This enables type-based routing, such as applying different preprocessing to numeric versus categorical features. It's particularly useful for heterogeneous datasets with mixed feature types.
Usage
Use Discard to remove unwanted features or sensitive data. Use Select to extract specific features for processing or to apply transformations to feature subsets. Use SelectType to route different feature types through different preprocessing pipelines, a common pattern when dealing with mixed data types.
Code Reference
Source Location
- Repository: Online_ml_River
- File: river/compose/select.py
Signature
class Discard(base.Transformer):
def __init__(self, *keys: base.typing.FeatureName):
...
class Select(base.MiniBatchTransformer):
def __init__(self, *keys: base.typing.FeatureName):
...
class SelectType(base.Transformer):
def __init__(self, *types: type):
...
Import
from river import compose
I/O Contract
| Parameter | Type | Description |
|---|---|---|
| keys | variable args | Feature names to discard or select |
| x | dict | Feature dictionary for single-sample |
| X | DataFrame | Feature dataframe for mini-batch (Select only) |
| Parameter | Type | Description |
|---|---|---|
| types | variable args of type | Python types to filter by |
| x | dict | Feature dictionary to filter |
| Method | Return Type | Description |
|---|---|---|
| transform_one(x) | dict | Filtered feature dictionary |
| transform_many(X) | DataFrame | Filtered dataframe (Select only) |
| Method | Parameters | Description |
|---|---|---|
| transform_one(x) | x: dict | Filters features for single sample |
| transform_many(X) | X: DataFrame | Filters features for batch (Select only) |
Usage Examples
from river import compose
# Discard: Remove unwanted features
x = {'a': 42, 'b': 12, 'c': 13}
print(compose.Discard('a', 'b').transform_one(x))
# {'c': 13}
# Use in pipeline to remove features before processing
from river import feature_extraction as fx
x = {'sales': 10, 'shop': 'Ikea', 'country': 'Sweden'}
pipeline = (
compose.Discard('shop', 'country') |
fx.PolynomialExtender()
)
print(pipeline.transform_one(x))
# {'sales': 10, 'sales*sales': 100}
# Select: Keep only specific features
x = {'a': 42, 'b': 12, 'c': 13}
print(compose.Select('c').transform_one(x))
# {'c': 13}
# Select with pipeline
x = {'sales': 10, 'shop': 'Ikea', 'country': 'Sweden'}
pipeline = (
compose.Select('sales') |
fx.PolynomialExtender()
)
print(pipeline.transform_one(x))
# {'sales': 10, 'sales*sales': 100}
# Select with mini-batch processing
import pandas as pd
X = pd.DataFrame([
{'x_1': 10.5, 'x_2': 8.1, 'x_3': 5.2},
{'x_1': 9.1, 'x_2': 8.9, 'x_3': 6.3},
{'x_1': 10.9, 'x_2': 10.7, 'x_3': 7.1},
])
selector = compose.Select('x_1', 'x_2')
print(selector.transform_many(X))
# x_1 x_2
# 0 10.5 8.1
# 1 9.1 8.9
# 2 10.9 10.7
# SelectType: Filter by Python type
import numbers
from river import preprocessing
from river import linear_model
x = {'age': 25, 'name': 'Alice', 'salary': 50000, 'city': 'NYC'}
# Apply different preprocessing to different types
num = compose.SelectType(numbers.Number) | preprocessing.StandardScaler()
cat = compose.SelectType(str) | preprocessing.OneHotEncoder()
# Combine both pipelines
model = (num + cat) | linear_model.LogisticRegression()
# SelectType filters in transform
selector = compose.SelectType(numbers.Number)
print(selector.transform_one(x))
# {'age': 25, 'salary': 50000}
selector = compose.SelectType(str)
print(selector.transform_one(x))
# {'name': 'Alice', 'city': 'NYC'}
# Example: Complex feature routing
data = {
'age': 30,
'income': 75000,
'name': 'Bob',
'city': 'LA',
'score': 0.85
}
# Route numeric vs categorical features differently
numeric_pipeline = (
compose.SelectType(int, float) |
preprocessing.StandardScaler() |
compose.Prefixer('num_')
)
categorical_pipeline = (
compose.SelectType(str) |
preprocessing.OneHotEncoder() |
compose.Prefixer('cat_')
)
combined = numeric_pipeline + categorical_pipeline
# Example: Select for feature engineering
from river import stats
# Select subset for aggregation
user_features = compose.Select('user_id', 'user_age', 'user_gender')
item_features = compose.Select('item_id', 'item_price', 'item_category')
# Create interactions between selected subsets
interactions = user_features * item_features
# Example: Discard sensitive features
sensitive_features = ['ssn', 'credit_card', 'password']
anonymizer = compose.Discard(*sensitive_features)
# Use before model training
pipeline = (
anonymizer |
preprocessing.StandardScaler() |
linear_model.LogisticRegression()
)