Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Haifengl Smile Neighbor Record API

From Leeroopedia


Overview

This API Doc covers the Neighbor record -- the immutable data class that encapsulates individual nearest neighbor search results in Smile's smile.neighbor package. Every KNN and RNN query returns results as Neighbor instances, making this record the universal output type of the entire neighbor search workflow.

Neighbor is implemented as a Java record (introduced in Java 16), providing automatic implementations of equals(), hashCode(), and toString(), with accessor methods for each component.

Import

import smile.neighbor.Neighbor;

Signature

Source: Neighbor.java (Lines 38-63)

/**
 * The immutable object encapsulating nearest neighbor search results.
 *
 * @param key the key of the neighbor (e.g., coordinate vector).
 * @param value the value of the neighbor (e.g., label, full data object).
 * @param index the index of the neighbor in the original dataset.
 * @param distance the distance between the query and this neighbor.
 * @param <K> the type of keys.
 * @param <V> the type of associated objects.
 */
public record Neighbor<K, V>(K key, V value, int index, double distance)
        implements Comparable<Neighbor<K, V>> {

    /**
     * Compares neighbors by distance (ascending), then by index (ascending).
     * @param o the other neighbor to compare to.
     * @return comparison result.
     */
    @Override
    public int compareTo(Neighbor<K, V> o)

    /**
     * Returns a string representation: "Neighbor(key[index]: distance)".
     */
    @Override
    public String toString()

    /**
     * Creates a Neighbor where key and value are the same object.
     * @param key the query key (also used as value).
     * @param index the index of the object in the dataset.
     * @param distance the distance between query and neighbor.
     * @param <T> the data type of key and object.
     * @return the neighbor object.
     */
    public static <T> Neighbor<T, T> of(T key, int index, double distance)
}

Record Components (Accessor Methods)

Since Neighbor is a Java record, accessor methods are generated automatically with the same names as the components (not getKey(), but key()):

Accessor Return Type Description
key() K The spatial key of the neighbor. For Euclidean structures, this is double[] (the coordinate vector). For generic structures, matches the key type parameter.
value() V The data payload associated with the neighbor. Often the same as key when using of() factory or when key and value types match. May be a label, object ID, or full data record.
index() int Zero-based position of this neighbor in the original dataset passed to the index constructor. Enables cross-referencing with external arrays (labels, metadata, raw data).
distance() double The distance from the query point to this neighbor. For exact structures, this is the true distance. For approximate structures, this is still the exact distance (computed on the candidate).

Comparable Implementation

@Override
public int compareTo(Neighbor<K, V> o) {
    int d = Double.compare(distance, o.distance);
    // If distances are the same (e.g., duplicate samples),
    // sort by sample index for deterministic ordering.
    return d == 0 ? Integer.compare(index, o.index) : d;
}

Sorting order:

  1. Primary: Distance ascending (closest first)
  2. Secondary: Index ascending (lower index first when distances tie)

This makes Neighbor directly usable with Collections.sort() and Arrays.sort().

Factory Method

public static <T> Neighbor<T, T> of(T key, int index, double distance) {
    return new Neighbor<>(key, key, index, distance);
}

The of() factory creates a Neighbor where the key and value are the same object. This is the common case when data points serve as both spatial keys and payload (e.g., KDTree.of(double[][] data) stores each double[] as both key and value).

toString Format

@Override
public String toString() {
    return String.format("Neighbor(%s[%d]: %s)", key, index, Strings.format(distance));
}

Output example: Neighbor([D@1a2b3c4d[42]: 3.1416)

The distance is formatted using Smile's Strings.format() utility for consistent numeric display.

Usage Examples

Processing KNN Results

import smile.neighbor.KDTree;
import smile.neighbor.Neighbor;

double[][] data = loadData();
KDTree<double[]> tree = KDTree.of(data);

double[] query = {1.0, 2.0, 3.0};
Neighbor<double[], double[]>[] results = tree.search(query, 5);

// Access individual neighbor fields
for (Neighbor<double[], double[]> n : results) {
    double[] coords = n.key();       // neighbor's coordinates
    double[] value  = n.value();     // same as key for KDTree.of()
    int      idx    = n.index();     // position in original data array
    double   dist   = n.distance();  // distance from query

    System.out.printf("Neighbor #%d at distance %.4f%n", idx, dist);
}

k-NN Classification Pattern

import smile.neighbor.KDTree;
import smile.neighbor.Neighbor;
import java.util.HashMap;
import java.util.Map;

// Build index with separate keys (features) and values (labels)
double[][] features = loadFeatures();  // n x d
int[] labels = loadLabels();           // n labels

// Use Integer[] instead of int[] for generic type parameter
Integer[] boxedLabels = new Integer[labels.length];
for (int i = 0; i < labels.length; i++) boxedLabels[i] = labels[i];

KDTree<Integer> tree = new KDTree<>(features, boxedLabels);

// Classify a new point
double[] query = newSample();
Neighbor<double[], Integer>[] neighbors = tree.search(query, 5);

// Majority vote
Map<Integer, Integer> votes = new HashMap<>();
for (Neighbor<double[], Integer> n : neighbors) {
    int label = n.value();  // the class label of this neighbor
    votes.merge(label, 1, Integer::sum);
}

int predictedClass = votes.entrySet().stream()
    .max(Map.Entry.comparingByValue())
    .get().getKey();

Weighted k-NN Regression Pattern

import smile.neighbor.KDTree;
import smile.neighbor.Neighbor;

double[][] features = loadFeatures();
Double[] targets = loadTargets(); // continuous values

KDTree<Double> tree = new KDTree<>(features, targets);

double[] query = newSample();
Neighbor<double[], Double>[] neighbors = tree.search(query, 10);

// Inverse-distance-weighted average
double weightedSum = 0.0;
double weightTotal = 0.0;
for (Neighbor<double[], Double> n : neighbors) {
    double weight = 1.0 / (n.distance() + 1e-10); // epsilon to avoid /0
    weightedSum += weight * n.value();
    weightTotal += weight;
}
double prediction = weightedSum / weightTotal;

Processing RNN Results

import smile.neighbor.KDTree;
import smile.neighbor.Neighbor;
import java.util.ArrayList;
import java.util.Collections;
import java.util.List;

double[][] data = loadData();
KDTree<double[]> tree = KDTree.of(data);

double[] query = {1.0, 2.0, 3.0};
List<Neighbor<double[], double[]>> results = new ArrayList<>();
tree.search(query, 5.0, results);

// Sort results by distance (Neighbor implements Comparable)
Collections.sort(results);

// Local density estimate
int count = results.size();
double maxDistance = results.isEmpty() ? 0.0 : results.get(results.size() - 1).distance();
System.out.printf("Found %d neighbors within radius 5.0, farthest at %.4f%n",
    count, maxDistance);

Cross-Referencing with External Data

import smile.neighbor.KDTree;
import smile.neighbor.Neighbor;

double[][] coordinates = loadCoordinates();
String[] names = loadNames();     // external metadata array
double[] scores = loadScores();   // another external array

KDTree<double[]> tree = KDTree.of(coordinates);

Neighbor<double[], double[]>[] results = tree.search(query, 3);

// Use index() to cross-reference with external arrays
for (Neighbor<double[], double[]> n : results) {
    int i = n.index();
    System.out.printf("  %s (score=%.2f, dist=%.4f)%n",
        names[i], scores[i], n.distance());
}

Handling Approximate Search Results

import smile.neighbor.LSH;
import smile.neighbor.Neighbor;

double[][] data = loadHighDimData();
LSH<double[]> lsh = new LSH<>(data, data, 4.0);

// nearest() may return null for LSH if no candidates found
Neighbor<double[], double[]> nearest = lsh.nearest(query);
if (nearest != null) {
    System.out.printf("Nearest: index=%d, distance=%.4f%n",
        nearest.index(), nearest.distance());
} else {
    System.out.println("No candidates found -- consider increasing L or w");
}

Internal: NeighborBuilder

The NeighborBuilder class is a mutable internal counterpart to the immutable Neighbor record. It is used during search operations to efficiently update candidate neighbors in heap structures without creating many temporary Neighbor objects. Once the search completes, NeighborBuilder.toNeighbor() converts each mutable builder into an immutable Neighbor record.

// Internal usage -- not part of the public API
class NeighborBuilder<K, V> implements Comparable<NeighborBuilder<K, V>> {
    K key;
    V value;
    int index;
    double distance;

    public Neighbor<K, V> toNeighbor() {
        return new Neighbor<>(key, value, index, distance);
    }
}

Users do not interact with NeighborBuilder directly; it is package-private and only used internally by search implementations.

Source

Related

Categories

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment