Implementation:Online ml River Neighbors KNNRegressor
| Knowledge Sources | |
|---|---|
| Domains | Online_Learning, Regression, Instance_Based_Learning |
| Last Updated | 2026-02-08 16:00 GMT |
Overview
K-Nearest Neighbors regressor predicts continuous values by aggregating the k closest stored instances using mean, median, or weighted mean.
Description
This implementation stores instances in a configurable search engine (default SWINN) and aggregates neighbor values for predictions. Three aggregation methods are supported: simple mean (unweighted average), median (robust to outliers), and weighted_mean (inverse distance weighting). When the closest instance has distance 0 (exact match), it returns that value directly. The model uses a sliding window strategy via the search engine, automatically dropping old instances.
Usage
Use KNN regressor for non-parametric regression when you believe similar instances have similar target values. Choose aggregation_method based on your data: 'mean' for general use, 'median' for robustness to outliers, 'weighted_mean' (default) to emphasize closer neighbors. Configure the search engine for your memory/accuracy tradeoff. Preprocessing with StandardScaler is recommended to normalize feature scales.
Code Reference
Source Location
- Repository: Online_ml_River
- File: river/neighbors/knn_regressor.py
Signature
class KNNRegressor(base.Regressor):
def __init__(
self,
n_neighbors: int = 5,
engine: BaseNN | None = None,
aggregation_method: str = "mean",
):
self.n_neighbors = n_neighbors
if engine is None:
engine = SWINN(dist_func=functools.partial(utils.math.minkowski_distance, p=2))
self.engine = engine
self._nn: BaseNN = self.engine.clone(include_attributes=True)
self._check_aggregation_method(aggregation_method)
self.aggregation_method = aggregation_method
Import
from river import neighbors
I/O Contract
Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
| n_neighbors | int | 5 | Number of nearest neighbors |
| engine | BaseNN or None | SWINN | Search engine for instance storage |
| aggregation_method | str | "mean" | Aggregation: 'mean', 'median', 'weighted_mean' |
Attributes
| Attribute | Type | Description |
|---|---|---|
| _nn | BaseNN | Internal search engine instance |
Input/Output
| Method | Input | Output |
|---|---|---|
| learn_one | x: dict, y: float | None |
| predict_one | x: dict | float |
Usage Examples
from river import datasets
from river import evaluate
from river import metrics
from river import neighbors
from river import preprocessing
dataset = datasets.TrumpApproval()
# Default configuration (mean aggregation)
model = neighbors.KNNRegressor()
evaluate.progressive_val_score(dataset, model, metrics.RMSE())
# RMSE: 1.427743
# Using median aggregation (robust to outliers)
model = (
preprocessing.StandardScaler() |
neighbors.KNNRegressor(
n_neighbors=7,
aggregation_method="median"
)
)
# Weighted mean (emphasize closer neighbors)
model = neighbors.KNNRegressor(
n_neighbors=5,
aggregation_method="weighted_mean"
)
# Custom search engine
import functools
from river import utils
model = neighbors.KNNRegressor(
n_neighbors=3,
engine=neighbors.LazySearch(
window_size=50,
dist_func=functools.partial(utils.math.minkowski_distance, p=2)
),
aggregation_method="mean"
)