Implementation:Online ml River Neighbors LazySearch
| Knowledge Sources | |
|---|---|
| Domains | Online_Learning, Nearest_Neighbors, Search_Engines |
| Last Updated | 2026-02-08 16:00 GMT |
Overview
LazySearch provides exact nearest neighbor search using linear scan over a sliding window of instances with configurable similarity filtering.
Description
This search engine maintains a fixed-size sliding window (FIFO queue) of instances. Search queries perform a linear scan, computing distances to all items and returning the k smallest. The min_distance_keep parameter controls instance addition - only items with distance > threshold to existing items are added, promoting window diversity. Items are stored as tuples (data, optional_metadata), where only data is passed to the distance function. This "lazy" approach guarantees exact results but scales linearly with window size.
Usage
Use LazySearch when you need exact nearest neighbors and window_size is small enough for linear scan (typically < 1000). Set min_distance_keep > 0 to maintain a diverse window by filtering similar instances. Use min_distance_keep=0 to add all instances including duplicates. The distance function can be customized (default: Minkowski with p=2). Best for scenarios prioritizing accuracy over speed with manageable window sizes.
Code Reference
Source Location
- Repository: Online_ml_River
- File: river/neighbors/lazy.py
Signature
class LazySearch(BaseNN):
def __init__(
self,
window_size: int = 50,
min_distance_keep: float = 0.0,
dist_func: DistanceFunc | FunctionWrapper | None = None,
):
self.window_size = window_size
self.min_distance_keep = min_distance_keep
if dist_func is None:
dist_func = functools.partial(utils.math.minkowski_distance, p=2)
self.dist_func = dist_func
self.window: collections.deque = collections.deque(maxlen=self.window_size)
Import
from river import neighbors
I/O Contract
Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
| window_size | int | 50 | Maximum instances in sliding window |
| min_distance_keep | float | 0.0 | Minimum distance to add new items (diversity control) |
| dist_func | DistanceFunc or None | minkowski(p=2) | Distance function for comparing items |
Attributes
| Attribute | Type | Description |
|---|---|---|
| window | collections.deque | Sliding window storing (item, metadata, distance) tuples |
Input/Output
| Method | Input | Output |
|---|---|---|
| append | item: Any, extra: Any | None |
| update | item: Any, n_neighbors: int, extra: Any | bool |
| search | item: Any, n_neighbors: int | tuple[list, list] |
Usage Examples
import functools
from river import neighbors
from river import utils
from river import datasets
from river import evaluate
from river import metrics
from river import preprocessing
# Basic usage with KNNClassifier
dataset = datasets.Phishing()
model = (
preprocessing.StandardScaler() |
neighbors.KNNClassifier(
n_neighbors=3,
engine=neighbors.LazySearch(
window_size=30,
dist_func=functools.partial(utils.math.minkowski_distance, p=2)
)
)
)
evaluate.progressive_val_score(dataset, model, metrics.Accuracy())
# Using diversity control
search_engine = neighbors.LazySearch(
window_size=100,
min_distance_keep=0.1, # Only add items > 0.1 distance from existing
dist_func=functools.partial(utils.math.minkowski_distance, p=1)
)
# Direct usage
search_engine.append(({0: 1.0, 1: 2.0}, 'class_a'))
search_engine.append(({0: 1.5, 1: 2.5}, 'class_b'))
# Search for 2 nearest neighbors
neighbors_list, distances = search_engine.search(({0: 1.2, 1: 2.1}, None), n_neighbors=2)
# Update with diversity check
added = search_engine.update(({0: 5.0, 1: 5.0}, 'class_c'), n_neighbors=1)
# Returns True if added, False if too similar to existing
# Custom distance function
def custom_distance(x, y):
# x and y are the first element of tuples (features, label)
return sum(abs(x[0].get(k, 0) - y[0].get(k, 0)) for k in set(x[0]) | set(y[0]))
search_engine = neighbors.LazySearch(
window_size=50,
dist_func=custom_distance
)