Implementation:Online ml River Neighbors KNNClassifier
| Knowledge Sources | |
|---|---|
| Domains | Online_Learning, Classification, Instance_Based_Learning |
| Last Updated | 2026-02-08 16:00 GMT |
Overview
K-Nearest Neighbors classifier stores instances in a sliding window and predicts based on similarity to the k closest neighbors.
Description
This implementation uses a configurable search engine (default SWINN for approximate search) to store and query instances. During prediction, it finds the k nearest neighbors, weights their votes (inversely by distance if weighted=True), and aggregates to produce class probabilities. Exact distance-0 matches return that class with 100% confidence. The classifier maintains a set of known classes and can optionally clean up classes no longer in the window. Votes can be normalized via softmax or simple division.
Usage
Use KNN when similar instances likely share the same class and you want a simple, interpretable model. Configure the search engine for your needs: SWINN for speed with approximate search, LazySearch for exact search with smaller windows. Set weighted=True (default) to give closer neighbors more influence. Use cleanup_every to periodically remove stale classes. Works well after StandardScaler preprocessing to normalize feature scales.
Code Reference
Source Location
- Repository: Online_ml_River
- File: river/neighbors/knn_classifier.py
Signature
class KNNClassifier(base.Classifier):
def __init__(
self,
n_neighbors: int = 5,
engine: BaseNN | None = None,
weighted: bool = True,
cleanup_every: int = 0,
softmax: bool = False,
):
self.n_neighbors = n_neighbors
if engine is None:
engine = SWINN(dist_func=functools.partial(utils.math.minkowski_distance, p=2))
self.engine = engine
self.weighted = weighted
self.cleanup_every = cleanup_every
self.classes: set[base.typing.ClfTarget] = set()
self.softmax = softmax
Import
from river import neighbors
I/O Contract
Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
| n_neighbors | int | 5 | Number of nearest neighbors to query |
| engine | BaseNN or None | SWINN | Search engine for storing/querying instances |
| weighted | bool | True | Weight votes by inverse distance |
| cleanup_every | int | 0 | Cleanup stale classes every N steps (0=never) |
| softmax | bool | False | Use softmax for probability normalization |
Attributes
| Attribute | Type | Description |
|---|---|---|
| classes | set | Set of known classes |
| _nn | BaseNN | Internal search engine instance |
Input/Output
| Method | Input | Output |
|---|---|---|
| learn_one | x: dict, y: Any | None |
| predict_proba_one | x: dict | dict[Any, float] |
| predict_one | x: dict | Any |
| clean_up_classes | (none) | None |
Usage Examples
import functools
from river import datasets
from river import evaluate
from river import metrics
from river import neighbors
from river import preprocessing
from river import utils
dataset = datasets.Phishing()
# Custom distance metric
l1_dist = functools.partial(utils.math.minkowski_distance, p=1)
model = (
preprocessing.StandardScaler() |
neighbors.KNNClassifier(
engine=neighbors.SWINN(
dist_func=l1_dist,
seed=42
)
)
)
evaluate.progressive_val_score(dataset, model, metrics.Accuracy())
# Accuracy: 89.59%
# Using LazySearch engine with exact search
model = (
preprocessing.StandardScaler() |
neighbors.KNNClassifier(
n_neighbors=3,
engine=neighbors.LazySearch(
window_size=30,
dist_func=functools.partial(utils.math.minkowski_distance, p=2)
)
)
)
# Clean up classes periodically
model = neighbors.KNNClassifier(
n_neighbors=5,
cleanup_every=100, # Clean every 100 instances
weighted=True,
softmax=False
)