Principle:Online ml River Online Nearest Neighbors
| Knowledge Sources | |
|---|---|
| Domains | Online_Learning, Instance_Based_Learning |
| Last Updated | 2026-02-08 18:00 GMT |
Overview
K-nearest neighbors (KNN) is a non-parametric, instance-based learning method that classifies or regresses by finding the K training instances closest to a query point and aggregating their labels or values. Unlike parametric models, KNN stores training data directly and defers computation to prediction time (lazy learning).
In the online/streaming setting, KNN faces a fundamental challenge: the training set grows unboundedly. Online KNN methods address this through sliding windows, reservoir sampling, or spatial indexing structures that support efficient insertion and deletion.
Theoretical Basis
Classification
For classification, the predicted label is the majority vote among the K nearest neighbors:
y_hat = mode{ y_i : x_i in KNN(x, K) }
Optionally, votes can be weighted by the inverse of distance to give closer neighbors more influence.
Regression
For regression, the prediction is the (weighted) average of the neighbors' target values:
y_hat = (1/K) * sum{ y_i : x_i in KNN(x, K) }
Distance Metrics
The choice of distance metric d(x, x') is critical. Common choices include Euclidean, Manhattan, and Minkowski distances. In streaming settings, features may need normalization as their scales drift over time.
Online Window Management
To bound memory, online KNN typically maintains a fixed-size window of the most recent W examples. Older examples are discarded, which naturally adapts to concept drift. The search structure must support:
- Insert: Add a new example in sub-linear time.
- Delete: Remove the oldest example efficiently.
- Query: Find the K nearest neighbors in sub-linear time.
Spatial data structures such as KD-trees, ball trees, or locality-sensitive hashing (LSH) are adapted for this purpose. A brute-force (lazy) search is also viable when the window size is moderate.
Computational Considerations
- Prediction cost: O(W) for brute-force, O(log W) average for tree-based search.
- Memory: O(W * d) where d is the feature dimensionality.
- Curse of dimensionality: KNN performance degrades in very high dimensions where all points become equidistant.