Principle:Online ml River Online Nearest Neighbors

Knowledge Sources	Machine Learning The Elements of Statistical Learning
Domains	Online_Learning, Instance_Based_Learning
Last Updated	2026-02-08 18:00 GMT

Overview

K-nearest neighbors (KNN) is a non-parametric, instance-based learning method that classifies or regresses by finding the K training instances closest to a query point and aggregating their labels or values. Unlike parametric models, KNN stores training data directly and defers computation to prediction time (lazy learning).

In the online/streaming setting, KNN faces a fundamental challenge: the training set grows unboundedly. Online KNN methods address this through sliding windows, reservoir sampling, or spatial indexing structures that support efficient insertion and deletion.

Theoretical Basis

Classification

For classification, the predicted label is the majority vote among the K nearest neighbors:

y_hat = mode{ y_i : x_i in KNN(x, K) }

Optionally, votes can be weighted by the inverse of distance to give closer neighbors more influence.

Regression

For regression, the prediction is the (weighted) average of the neighbors' target values:

y_hat = (1/K) * sum{ y_i : x_i in KNN(x, K) }

Distance Metrics

The choice of distance metric d(x, x') is critical. Common choices include Euclidean, Manhattan, and Minkowski distances. In streaming settings, features may need normalization as their scales drift over time.

Online Window Management

To bound memory, online KNN typically maintains a fixed-size window of the most recent W examples. Older examples are discarded, which naturally adapts to concept drift. The search structure must support:

Insert: Add a new example in sub-linear time.
Delete: Remove the oldest example efficiently.
Query: Find the K nearest neighbors in sub-linear time.

Spatial data structures such as KD-trees, ball trees, or locality-sensitive hashing (LSH) are adapted for this purpose. A brute-force (lazy) search is also viable when the window size is moderate.

Computational Considerations

Prediction cost: O(W) for brute-force, O(log W) average for tree-based search.
Memory: O(W * d) where d is the feature dimensionality.
Curse of dimensionality: KNN performance degrades in very high dimensions where all points become equidistant.

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment