Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Haifengl Smile NearestNeighborGraph

From Leeroopedia
Revision as of 15:04, 16 February 2026 by Admin (talk | contribs) (Auto-imported from implementations/Haifengl_Smile_NearestNeighborGraph.md)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)


Knowledge Sources
Domains Graph Theory, Nearest Neighbor, Machine Learning
Last Updated 2026-02-08 22:00 GMT

Overview

NearestNeighborGraph is a record class that builds and represents k-nearest neighbor graphs from data, supporting exact, random, and approximate (NN-Descent) construction methods.

Description

NearestNeighborGraph is a Java record that stores k-nearest neighbor relationships as parallel arrays of neighbor indices and distances. It provides multiple static factory methods for constructing k-NN graphs:

  • of() - Exact brute-force k-NN computation with any distance function.
  • random() - Random neighbor initialization with reverse-neighbor extension.
  • descent() - Approximate NN-Descent algorithm using random projection forests for initialization, followed by iterative refinement. This is efficient for large datasets.

The record stores the neighbor indices, distances, the value of k, and an optional index mapping vertices back to original dataset positions. It can convert to an AdjacencyList graph and extract the largest connected component.

Usage

Use NearestNeighborGraph for building k-NN graphs required by manifold learning algorithms (e.g., UMAP, t-SNE, Isomap), spectral clustering, or any algorithm that operates on neighborhood structures. The descent() method is recommended for large datasets where exact computation is prohibitive.

Code Reference

Source Location

Signature

public record NearestNeighborGraph(int k, int[][] neighbors, double[][] distances, int[] index) {
    // Constructors
    public NearestNeighborGraph(int k, int[][] neighbors, double[][] distances);
    public NearestNeighborGraph(int k, int[][] neighbors, double[][] distances, int[] index);

    // Instance methods
    public int size();
    public AdjacencyList graph(boolean digraph);
    public NearestNeighborGraph largest(boolean digraph);

    // Exact k-NN (Euclidean)
    public static NearestNeighborGraph of(double[][] data, int k);

    // Exact k-NN (custom distance)
    public static <T> NearestNeighborGraph of(T[] data, Distance<T> distance, int k);

    // Random neighbor graph
    public static <T> NearestNeighborGraph random(T[] data, Distance<T> distance, int k);

    // Approximate NN-Descent (Euclidean)
    public static NearestNeighborGraph descent(double[][] data, int k);
    public static NearestNeighborGraph descent(double[][] data, int k, int numTrees,
        int leafSize, int maxCandidates, int maxIter, double delta);

    // Approximate NN-Descent (custom metric)
    public static <T> NearestNeighborGraph descent(T[] data, Metric<T> distance, int k);
    public static <T> NearestNeighborGraph descent(T[] data, Metric<T> distance, int k,
        int maxCandidates, int maxIter, double delta);
}

Import

import smile.graph.NearestNeighborGraph;

I/O Contract

Inputs

Name Type Required Description
data double[][] or T[] Yes The dataset (rows are samples).
k int Yes The number of nearest neighbors (must be > 1).
distance Distance<T> or Metric<T> No The distance function. Defaults to Euclidean for double[][] overloads.
numTrees int No Number of random projection trees for descent initialization (default 5).
leafSize int No Maximum leaf node size in random projection trees (default k).
maxCandidates int No Maximum candidates in NN-Descent search (default 50).
maxIter int No Maximum iterations for NN-Descent (default 50 for Euclidean, 10 for generic).
delta double No Early stopping threshold for NN-Descent (default 0.001).

Outputs

Name Type Description
NearestNeighborGraph NearestNeighborGraph The k-NN graph record with neighbor indices and distances.
neighbors() int[][] 2D array where neighbors[i] contains the k nearest neighbor indices of vertex i.
distances() double[][] 2D array where distances[i] contains the distances to the k nearest neighbors of vertex i.
graph(digraph) AdjacencyList Converts to an AdjacencyList graph representation.
largest(digraph) NearestNeighborGraph The largest connected component of the k-NN graph.

Usage Examples

Basic Usage

import smile.graph.NearestNeighborGraph;
import smile.graph.AdjacencyList;

// Sample data
double[][] data = {
    {0.0, 0.0}, {1.0, 0.0}, {0.0, 1.0},
    {1.0, 1.0}, {2.0, 2.0}, {3.0, 3.0}
};

// Exact k-NN with Euclidean distance
NearestNeighborGraph knn = NearestNeighborGraph.of(data, 3);

// Access neighbor information
int[] neighbors0 = knn.neighbors()[0];    // Nearest neighbors of point 0
double[] dists0 = knn.distances()[0];     // Corresponding distances

// Convert to graph for further algorithms
AdjacencyList graph = knn.graph(false);   // Undirected

Approximate NN-Descent

import smile.graph.NearestNeighborGraph;

// Large dataset - use approximate NN-Descent
double[][] largeData = new double[10000][100];
// ... populate data ...

// Fast approximate k-NN graph
NearestNeighborGraph knn = NearestNeighborGraph.descent(largeData, 15);

// Get largest connected component
NearestNeighborGraph lcc = knn.largest(false);

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment