Implementation:Online ml River FeatureSelection VarianceThreshold
| Knowledge Sources | |
|---|---|
| Domains | Online_Learning, Feature_Selection, Unsupervised_Learning |
| Last Updated | 2026-02-08 16:00 GMT |
Overview
Removes low-variance features based on incrementally computed running variance statistics.
Description
VarianceThreshold performs unsupervised feature selection by removing features with variance below a specified threshold. It maintains running variance statistics for each feature using the stats.Var class and filters out features that show insufficient variability. A minimum sample requirement prevents premature filtering before enough data has been observed. Features are evaluated independently without considering the target variable.
Usage
Use this as a simple first-pass filter to remove constant or near-constant features that provide little information. Particularly useful as a preprocessing step to reduce dimensionality before applying supervised selection methods. The threshold parameter can be set based on domain knowledge or experimentation. Effective for removing features with measurement errors stuck at constant values or features with negligible variation in streaming data.
Code Reference
Source Location
- Repository: Online_ml_River
- File: river/feature_selection/variance.py
Signature
class VarianceThreshold(base.Transformer):
def __init__(self, threshold=0, min_samples=2)
Import
from river import feature_selection
I/O Contract
| Input | Output |
|---|---|
| Dict[str, float] - All features | Dict[str, float] - Features above variance threshold |
Usage Examples
from river import feature_selection
from river import stream
X = [
[0, 2, 0, 3],
[0, 1, 4, 3],
[0, 1, 1, 3]
]
selector = feature_selection.VarianceThreshold()
for x, _ in stream.iter_array(X):
selector.learn_one(x)
print(selector.transform_one(x))
# {0: 0, 1: 2, 2: 0, 3: 3} # All features kept initially
# {1: 1, 2: 4} # Feature 0 and 3 removed (low variance)
# {1: 1, 2: 1} # Same features kept