Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Haifengl Smile KNN RNN Search API

From Leeroopedia


Overview

This API Doc covers the KNNSearch and RNNSearch interfaces, which define the query API for all spatial index structures in Smile's smile.neighbor package. These interfaces provide the search() and nearest() methods that execute nearest neighbor queries against pre-built indexes.

Import

import smile.neighbor.KNNSearch;
import smile.neighbor.RNNSearch;
import smile.neighbor.Neighbor;
import java.util.List;
import java.util.ArrayList;

KNNSearch Interface

Source: KNNSearch.java (Lines 30-54)

Signature

public interface KNNSearch<K, V> {

    /**
     * Returns the nearest neighbor. Excludes the query object itself
     * (by reference equality) from the results.
     *
     * @param q the query key.
     * @return the nearest neighbor.
     */
    default Neighbor<K, V> nearest(K q) {
        return search(q, 1)[0];
    }

    /**
     * Retrieves the k nearest neighbors to the query key.
     *
     * @param q the query key.
     * @param k the number of nearest neighbors to search for.
     * @return the k nearest neighbors, sorted by distance (ascending).
     * @throws IllegalArgumentException if k <= 0 or k > dataset size.
     */
    Neighbor<K, V>[] search(K q, int k);
}

Parameters

Parameter Type Description
q K The query point. For Euclidean structures, this is double[]. For generic structures, it matches the key type.
k int The number of nearest neighbors to return. Must be > 0 and <= dataset size.

Return Value

Neighbor<K, V>[] -- An array of Neighbor records sorted by distance in ascending order (nearest first). Each element contains:

  • key -- the neighbor's key (e.g., its coordinate vector)
  • value -- the neighbor's associated data object
  • index -- the neighbor's position in the original dataset
  • distance -- the distance from the query to this neighbor

Example: k-Nearest Neighbor Search

import smile.neighbor.KDTree;
import smile.neighbor.Neighbor;
import smile.math.MathEx;

// Build index
double[][] data = MathEx.randn(10000, 5);
KDTree<double[]> tree = KDTree.of(data);

// Query: find 10 nearest neighbors
double[] query = {1.0, 2.0, 3.0, 4.0, 5.0};
Neighbor<double[], double[]>[] neighbors = tree.search(query, 10);

// Results are sorted by distance (ascending)
for (Neighbor<double[], double[]> n : neighbors) {
    System.out.printf("Index: %d, Distance: %.4f%n", n.index(), n.distance());
}

Example: Single Nearest Neighbor

// Find the single closest point
Neighbor<double[], double[]> nearest = tree.nearest(query);
System.out.printf("Nearest: index=%d, distance=%.4f%n",
    nearest.index(), nearest.distance());

RNNSearch Interface

Source: RNNSearch.java (Lines 31-41)

Signature

public interface RNNSearch<K, V> {

    /**
     * Retrieves the neighbors in a fixed radius of query object,
     * i.e. d(q, v) <= radius.
     *
     * @param q the query key.
     * @param radius the radius of search range from target. Must be > 0.
     * @param neighbors the list to store found neighbors on output.
     * @throws IllegalArgumentException if radius <= 0.
     */
    void search(K q, double radius, List<Neighbor<K, V>> neighbors);
}

Parameters

Parameter Type Description
q K The query point.
radius double The search radius. All points with distance <= radius from q are returned. Must be > 0. For BKTree, must be a positive integer.
neighbors List<Neighbor<K, V>> An output list that the method appends results to. The caller must provide this list (typically new ArrayList<>()). The list is not cleared by the method -- results are appended.

Return Value

void -- Results are written to the neighbors list parameter. The list is not sorted; the caller should sort if order is needed.

Example: Range Search

import smile.neighbor.KDTree;
import smile.neighbor.Neighbor;
import java.util.ArrayList;
import java.util.Collections;
import java.util.List;

// Build index
double[][] data = loadData();
KDTree<double[]> tree = KDTree.of(data);

// Range search: find all points within distance 2.5
double[] query = {1.0, 2.0, 3.0};
List<Neighbor<double[], double[]>> results = new ArrayList<>();
tree.search(query, 2.5, results);

// Sort by distance if needed (results are unsorted by default)
Collections.sort(results);

System.out.printf("Found %d neighbors within radius 2.5%n", results.size());
for (Neighbor<double[], double[]> n : results) {
    System.out.printf("  Index: %d, Distance: %.4f%n", n.index(), n.distance());
}

Example: Reusing the Result List

// Important: clear the list between queries since search() appends
List<Neighbor<double[], double[]>> neighbors = new ArrayList<>();

for (double[] q : queryPoints) {
    tree.search(q, 1.0, neighbors);
    processNeighbors(neighbors);
    neighbors.clear();  // Must clear before next query
}

Combined KNN + RNN Example

Many structures implement both interfaces, allowing both query types on the same index:

import smile.neighbor.KDTree;
import smile.neighbor.Neighbor;
import java.util.ArrayList;
import java.util.List;

double[][] data = loadData();
KDTree<double[]> tree = KDTree.of(data);
double[] query = data[42]; // query with a point from the dataset

// KNN search -- the query point (data[42]) is automatically excluded
Neighbor<double[], double[]>[] knnResults = tree.search(query, 5);
// knnResults will NOT contain data[42] itself

// RNN search -- same exclusion
List<Neighbor<double[], double[]>> rnnResults = new ArrayList<>();
tree.search(query, 3.0, rnnResults);
// rnnResults will NOT contain data[42] itself

MPLSH Extended Search API

MPLSH provides additional search methods with recall and probe parameters:

import smile.neighbor.MPLSH;
import smile.neighbor.Neighbor;
import java.util.ArrayList;
import java.util.List;

// After building and fitting MPLSH...
MPLSH<double[]> mplsh = buildAndFitMPLSH(data);

// Search with recall target and max probes
double recall = 0.95;  // target 95% recall
int maxProbes = 100;    // maximum number of bucket probes

// KNN with recall/probe parameters
Neighbor<double[], double[]>[] results = mplsh.search(query, 10, recall, maxProbes);

// Nearest with recall/probe parameters
Neighbor<double[], double[]> nearest = mplsh.nearest(query, recall, maxProbes);

// RNN with recall/probe parameters
List<Neighbor<double[], double[]>> rangeResults = new ArrayList<>();
mplsh.search(query, 5.0, rangeResults, recall, maxProbes);

Interface Implementation by Structure

Structure nearest(K q) search(K q, int k) search(K q, double r, List)
KDTree Overridden (optimized single-neighbor) Heap-based branch-and-bound Recursive branch-and-bound
CoverTree Default (calls search(q,1)) Top-down traversal with bound pruning Top-down traversal with radius bound
LSH Overridden (candidate set scan) Candidate set + heap selection Candidate set + radius filter
MPLSH Overridden (multi-probe + recall) Multi-probe candidate set + heap Multi-probe candidate set + radius filter
RPForest Default (calls search(q,1)) Union leaf candidates + heap selection Not implemented
BKTree Not implemented Not implemented Triangle inequality pruning on discrete metric
LinearSearch Overridden (parallel distance + argmin) Parallel distance + heap selection Parallel distance + threshold filter

Source

Related

Categories

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment