Overview
The Transformer classes define the interface for feature transformation components in River, including unsupervised transformers, supervised transformers, and their mini-batch variants.
Description
River provides multiple transformer base classes to handle different transformation scenarios. BaseTransformer defines operator overloading for composing transformers using + (TransformerUnion) and * (TransformerProduct or Grouper) operators, and requires implementing transform_one to transform feature dictionaries. Transformer extends this for unsupervised transformations with an optional learn_one method. SupervisedTransformer is for transformations that require target values during learning. MiniBatchTransformer and MiniBatchSupervisedTransformer extend these with transform_many and learn_many methods for efficient batch processing of pandas DataFrames.
Usage
Use Transformer for unsupervised feature transformations like scaling or encoding. Use SupervisedTransformer when your transformation needs access to target values during learning, such as target encoding. Use the MiniBatch variants when your transformer can efficiently process multiple examples simultaneously. All transformers must implement transform_one (and transform_many for mini-batch versions).
Code Reference
Source Location
Signature
class BaseTransformer:
"""Base functionality for transformers."""
def __add__(self, other: BaseTransformer) -> compose.TransformerUnion
def __radd__(self, other: BaseTransformer) -> compose.TransformerUnion
def __mul__(
self,
other: BaseTransformer | compose.Pipeline | FeatureName | list[FeatureName]
) -> compose.Grouper | compose.TransformerProduct
def __rmul__(
self,
other: BaseTransformer | compose.Pipeline | FeatureName | list[FeatureName]
) -> compose.Grouper | compose.TransformerProduct
@abc.abstractmethod
def transform_one(self, x: dict[FeatureName, Any]) -> dict[FeatureName, Any]
class Transformer(base.Estimator, BaseTransformer):
"""A transformer."""
@property
def _supervised(self) -> bool
def learn_one(self, x: dict[FeatureName, Any]) -> None
class SupervisedTransformer(base.Estimator, BaseTransformer):
"""A supervised transformer."""
@property
def _supervised(self) -> bool
def learn_one(self, x: dict[FeatureName, Any], y: base.typing.Target) -> None
class MiniBatchTransformer(Transformer):
"""A transform that can operate on mini-batches."""
@abc.abstractmethod
def transform_many(self, X: pd.DataFrame) -> pd.DataFrame
def learn_many(self, X: pd.DataFrame) -> None
class MiniBatchSupervisedTransformer(Transformer):
"""A supervised transformer that can operate on mini-batches."""
@property
def _supervised(self) -> bool
@abc.abstractmethod
def learn_many(self, X: pd.DataFrame, y: pd.Series) -> None
@abc.abstractmethod
def transform_many(self, X: pd.DataFrame) -> pd.DataFrame
Import
from river.base import Transformer, SupervisedTransformer
from river.base import MiniBatchTransformer, MiniBatchSupervisedTransformer
I/O Contract
transform_one
| Parameter |
Type |
Description
|
| x |
dict[FeatureName, Any] |
Dictionary of features to transform
|
| Returns |
Type |
Description
|
| transformed |
dict[FeatureName, Any] |
Dictionary of transformed features
|
Transformer.learn_one
| Parameter |
Type |
Description
|
| x |
dict[FeatureName, Any] |
Dictionary of features to learn from (unsupervised)
|
SupervisedTransformer.learn_one
| Parameter |
Type |
Description
|
| x |
dict[FeatureName, Any] |
Dictionary of features to learn from
|
| y |
Target |
The target value (supervised)
|
MiniBatch Methods
| Method |
Input |
Output |
Description
|
| transform_many |
X: DataFrame |
DataFrame |
Transform multiple examples at once
|
| learn_many |
X: DataFrame, y: Series (supervised) |
None |
Update from multiple examples
|
Usage Examples
from river import preprocessing
from river import feature_extraction
from river import compose
from river import datasets
# Create transformers
scaler = preprocessing.StandardScaler()
poly = feature_extraction.PolynomialExtender(degree=2)
# Compose transformers with + operator (TransformerUnion)
union = scaler + poly
# Compose transformers with * operator (TransformerProduct)
product = scaler * poly
# Use a transformer in a pipeline
model = scaler | preprocessing.MinMaxScaler()
# Single instance transformation
for x, y in datasets.TrumpApproval().take(10):
# Transform features
x_transformed = scaler.transform_one(x)
# Learn from features
scaler.learn_one(x)
# Implementing a custom transformer
from river.base import Transformer
class AddConstant(Transformer):
def __init__(self, value=1.0):
self.value = value
def transform_one(self, x):
# Add constant to all features
return {k: v + self.value for k, v in x.items()}
def learn_one(self, x):
# Stateless, no learning needed
pass
# Implementing a supervised transformer
from river.base import SupervisedTransformer
class TargetScaler(SupervisedTransformer):
def __init__(self):
self.mean = 0.0
self.n = 0
def learn_one(self, x, y):
# Learn from target
self.n += 1
self.mean += (y - self.mean) / self.n
def transform_one(self, x):
# Transform based on learned statistics
return {k: v / self.mean if self.mean != 0 else v for k, v in x.items()}
Related Pages