Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Online ml River Neighbors KNNRegressor

From Leeroopedia


Knowledge Sources
Domains Online_Learning, Regression, Instance_Based_Learning
Last Updated 2026-02-08 16:00 GMT

Overview

K-Nearest Neighbors regressor predicts continuous values by aggregating the k closest stored instances using mean, median, or weighted mean.

Description

This implementation stores instances in a configurable search engine (default SWINN) and aggregates neighbor values for predictions. Three aggregation methods are supported: simple mean (unweighted average), median (robust to outliers), and weighted_mean (inverse distance weighting). When the closest instance has distance 0 (exact match), it returns that value directly. The model uses a sliding window strategy via the search engine, automatically dropping old instances.

Usage

Use KNN regressor for non-parametric regression when you believe similar instances have similar target values. Choose aggregation_method based on your data: 'mean' for general use, 'median' for robustness to outliers, 'weighted_mean' (default) to emphasize closer neighbors. Configure the search engine for your memory/accuracy tradeoff. Preprocessing with StandardScaler is recommended to normalize feature scales.

Code Reference

Source Location

Signature

class KNNRegressor(base.Regressor):
    def __init__(
        self,
        n_neighbors: int = 5,
        engine: BaseNN | None = None,
        aggregation_method: str = "mean",
    ):
        self.n_neighbors = n_neighbors
        if engine is None:
            engine = SWINN(dist_func=functools.partial(utils.math.minkowski_distance, p=2))
        self.engine = engine
        self._nn: BaseNN = self.engine.clone(include_attributes=True)
        self._check_aggregation_method(aggregation_method)
        self.aggregation_method = aggregation_method

Import

from river import neighbors

I/O Contract

Parameters

Parameter Type Default Description
n_neighbors int 5 Number of nearest neighbors
engine BaseNN or None SWINN Search engine for instance storage
aggregation_method str "mean" Aggregation: 'mean', 'median', 'weighted_mean'

Attributes

Attribute Type Description
_nn BaseNN Internal search engine instance

Input/Output

Method Input Output
learn_one x: dict, y: float None
predict_one x: dict float

Usage Examples

from river import datasets
from river import evaluate
from river import metrics
from river import neighbors
from river import preprocessing

dataset = datasets.TrumpApproval()

# Default configuration (mean aggregation)
model = neighbors.KNNRegressor()
evaluate.progressive_val_score(dataset, model, metrics.RMSE())
# RMSE: 1.427743

# Using median aggregation (robust to outliers)
model = (
    preprocessing.StandardScaler() |
    neighbors.KNNRegressor(
        n_neighbors=7,
        aggregation_method="median"
    )
)

# Weighted mean (emphasize closer neighbors)
model = neighbors.KNNRegressor(
    n_neighbors=5,
    aggregation_method="weighted_mean"
)

# Custom search engine
import functools
from river import utils

model = neighbors.KNNRegressor(
    n_neighbors=3,
    engine=neighbors.LazySearch(
        window_size=50,
        dist_func=functools.partial(utils.math.minkowski_distance, p=2)
    ),
    aggregation_method="mean"
)

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment