Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Online ml River Neighbors LazySearch

From Leeroopedia
Revision as of 16:09, 16 February 2026 by Admin (talk | contribs) (Auto-imported from implementations/Online_ml_River_Neighbors_LazySearch.md)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)


Knowledge Sources
Domains Online_Learning, Nearest_Neighbors, Search_Engines
Last Updated 2026-02-08 16:00 GMT

Overview

LazySearch provides exact nearest neighbor search using linear scan over a sliding window of instances with configurable similarity filtering.

Description

This search engine maintains a fixed-size sliding window (FIFO queue) of instances. Search queries perform a linear scan, computing distances to all items and returning the k smallest. The min_distance_keep parameter controls instance addition - only items with distance > threshold to existing items are added, promoting window diversity. Items are stored as tuples (data, optional_metadata), where only data is passed to the distance function. This "lazy" approach guarantees exact results but scales linearly with window size.

Usage

Use LazySearch when you need exact nearest neighbors and window_size is small enough for linear scan (typically < 1000). Set min_distance_keep > 0 to maintain a diverse window by filtering similar instances. Use min_distance_keep=0 to add all instances including duplicates. The distance function can be customized (default: Minkowski with p=2). Best for scenarios prioritizing accuracy over speed with manageable window sizes.

Code Reference

Source Location

Signature

class LazySearch(BaseNN):
    def __init__(
        self,
        window_size: int = 50,
        min_distance_keep: float = 0.0,
        dist_func: DistanceFunc | FunctionWrapper | None = None,
    ):
        self.window_size = window_size
        self.min_distance_keep = min_distance_keep
        if dist_func is None:
            dist_func = functools.partial(utils.math.minkowski_distance, p=2)
        self.dist_func = dist_func
        self.window: collections.deque = collections.deque(maxlen=self.window_size)

Import

from river import neighbors

I/O Contract

Parameters

Parameter Type Default Description
window_size int 50 Maximum instances in sliding window
min_distance_keep float 0.0 Minimum distance to add new items (diversity control)
dist_func DistanceFunc or None minkowski(p=2) Distance function for comparing items

Attributes

Attribute Type Description
window collections.deque Sliding window storing (item, metadata, distance) tuples

Input/Output

Method Input Output
append item: Any, extra: Any None
update item: Any, n_neighbors: int, extra: Any bool
search item: Any, n_neighbors: int tuple[list, list]

Usage Examples

import functools
from river import neighbors
from river import utils
from river import datasets
from river import evaluate
from river import metrics
from river import preprocessing

# Basic usage with KNNClassifier
dataset = datasets.Phishing()

model = (
    preprocessing.StandardScaler() |
    neighbors.KNNClassifier(
        n_neighbors=3,
        engine=neighbors.LazySearch(
            window_size=30,
            dist_func=functools.partial(utils.math.minkowski_distance, p=2)
        )
    )
)

evaluate.progressive_val_score(dataset, model, metrics.Accuracy())

# Using diversity control
search_engine = neighbors.LazySearch(
    window_size=100,
    min_distance_keep=0.1,  # Only add items > 0.1 distance from existing
    dist_func=functools.partial(utils.math.minkowski_distance, p=1)
)

# Direct usage
search_engine.append(({0: 1.0, 1: 2.0}, 'class_a'))
search_engine.append(({0: 1.5, 1: 2.5}, 'class_b'))

# Search for 2 nearest neighbors
neighbors_list, distances = search_engine.search(({0: 1.2, 1: 2.1}, None), n_neighbors=2)

# Update with diversity check
added = search_engine.update(({0: 5.0, 1: 5.0}, 'class_c'), n_neighbors=1)
# Returns True if added, False if too similar to existing

# Custom distance function
def custom_distance(x, y):
    # x and y are the first element of tuples (features, label)
    return sum(abs(x[0].get(k, 0) - y[0].get(k, 0)) for k in set(x[0]) | set(y[0]))

search_engine = neighbors.LazySearch(
    window_size=50,
    dist_func=custom_distance
)

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment