Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Online ml River Neighbors KNNClassifier

From Leeroopedia


Knowledge Sources
Domains Online_Learning, Classification, Instance_Based_Learning
Last Updated 2026-02-08 16:00 GMT

Overview

K-Nearest Neighbors classifier stores instances in a sliding window and predicts based on similarity to the k closest neighbors.

Description

This implementation uses a configurable search engine (default SWINN for approximate search) to store and query instances. During prediction, it finds the k nearest neighbors, weights their votes (inversely by distance if weighted=True), and aggregates to produce class probabilities. Exact distance-0 matches return that class with 100% confidence. The classifier maintains a set of known classes and can optionally clean up classes no longer in the window. Votes can be normalized via softmax or simple division.

Usage

Use KNN when similar instances likely share the same class and you want a simple, interpretable model. Configure the search engine for your needs: SWINN for speed with approximate search, LazySearch for exact search with smaller windows. Set weighted=True (default) to give closer neighbors more influence. Use cleanup_every to periodically remove stale classes. Works well after StandardScaler preprocessing to normalize feature scales.

Code Reference

Source Location

Signature

class KNNClassifier(base.Classifier):
    def __init__(
        self,
        n_neighbors: int = 5,
        engine: BaseNN | None = None,
        weighted: bool = True,
        cleanup_every: int = 0,
        softmax: bool = False,
    ):
        self.n_neighbors = n_neighbors
        if engine is None:
            engine = SWINN(dist_func=functools.partial(utils.math.minkowski_distance, p=2))
        self.engine = engine
        self.weighted = weighted
        self.cleanup_every = cleanup_every
        self.classes: set[base.typing.ClfTarget] = set()
        self.softmax = softmax

Import

from river import neighbors

I/O Contract

Parameters

Parameter Type Default Description
n_neighbors int 5 Number of nearest neighbors to query
engine BaseNN or None SWINN Search engine for storing/querying instances
weighted bool True Weight votes by inverse distance
cleanup_every int 0 Cleanup stale classes every N steps (0=never)
softmax bool False Use softmax for probability normalization

Attributes

Attribute Type Description
classes set Set of known classes
_nn BaseNN Internal search engine instance

Input/Output

Method Input Output
learn_one x: dict, y: Any None
predict_proba_one x: dict dict[Any, float]
predict_one x: dict Any
clean_up_classes (none) None

Usage Examples

import functools
from river import datasets
from river import evaluate
from river import metrics
from river import neighbors
from river import preprocessing
from river import utils

dataset = datasets.Phishing()

# Custom distance metric
l1_dist = functools.partial(utils.math.minkowski_distance, p=1)

model = (
    preprocessing.StandardScaler() |
    neighbors.KNNClassifier(
        engine=neighbors.SWINN(
            dist_func=l1_dist,
            seed=42
        )
    )
)

evaluate.progressive_val_score(dataset, model, metrics.Accuracy())
# Accuracy: 89.59%

# Using LazySearch engine with exact search
model = (
    preprocessing.StandardScaler() |
    neighbors.KNNClassifier(
        n_neighbors=3,
        engine=neighbors.LazySearch(
            window_size=30,
            dist_func=functools.partial(utils.math.minkowski_distance, p=2)
        )
    )
)

# Clean up classes periodically
model = neighbors.KNNClassifier(
    n_neighbors=5,
    cleanup_every=100,  # Clean every 100 instances
    weighted=True,
    softmax=False
)

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment