Implementation:Haifengl Smile NearestNeighborGraph
| Knowledge Sources | |
|---|---|
| Domains | Graph Theory, Nearest Neighbor, Machine Learning |
| Last Updated | 2026-02-08 22:00 GMT |
Overview
NearestNeighborGraph is a record class that builds and represents k-nearest neighbor graphs from data, supporting exact, random, and approximate (NN-Descent) construction methods.
Description
NearestNeighborGraph is a Java record that stores k-nearest neighbor relationships as parallel arrays of neighbor indices and distances. It provides multiple static factory methods for constructing k-NN graphs:
- of() - Exact brute-force k-NN computation with any distance function.
- random() - Random neighbor initialization with reverse-neighbor extension.
- descent() - Approximate NN-Descent algorithm using random projection forests for initialization, followed by iterative refinement. This is efficient for large datasets.
The record stores the neighbor indices, distances, the value of k, and an optional index mapping vertices back to original dataset positions. It can convert to an AdjacencyList graph and extract the largest connected component.
Usage
Use NearestNeighborGraph for building k-NN graphs required by manifold learning algorithms (e.g., UMAP, t-SNE, Isomap), spectral clustering, or any algorithm that operates on neighborhood structures. The descent() method is recommended for large datasets where exact computation is prohibitive.
Code Reference
Source Location
- Repository: Haifengl_Smile
- File: base/src/main/java/smile/graph/NearestNeighborGraph.java
- Lines: 1-488
Signature
public record NearestNeighborGraph(int k, int[][] neighbors, double[][] distances, int[] index) {
// Constructors
public NearestNeighborGraph(int k, int[][] neighbors, double[][] distances);
public NearestNeighborGraph(int k, int[][] neighbors, double[][] distances, int[] index);
// Instance methods
public int size();
public AdjacencyList graph(boolean digraph);
public NearestNeighborGraph largest(boolean digraph);
// Exact k-NN (Euclidean)
public static NearestNeighborGraph of(double[][] data, int k);
// Exact k-NN (custom distance)
public static <T> NearestNeighborGraph of(T[] data, Distance<T> distance, int k);
// Random neighbor graph
public static <T> NearestNeighborGraph random(T[] data, Distance<T> distance, int k);
// Approximate NN-Descent (Euclidean)
public static NearestNeighborGraph descent(double[][] data, int k);
public static NearestNeighborGraph descent(double[][] data, int k, int numTrees,
int leafSize, int maxCandidates, int maxIter, double delta);
// Approximate NN-Descent (custom metric)
public static <T> NearestNeighborGraph descent(T[] data, Metric<T> distance, int k);
public static <T> NearestNeighborGraph descent(T[] data, Metric<T> distance, int k,
int maxCandidates, int maxIter, double delta);
}
Import
import smile.graph.NearestNeighborGraph;
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| data | double[][] or T[] | Yes | The dataset (rows are samples). |
| k | int | Yes | The number of nearest neighbors (must be > 1). |
| distance | Distance<T> or Metric<T> | No | The distance function. Defaults to Euclidean for double[][] overloads. |
| numTrees | int | No | Number of random projection trees for descent initialization (default 5). |
| leafSize | int | No | Maximum leaf node size in random projection trees (default k). |
| maxCandidates | int | No | Maximum candidates in NN-Descent search (default 50). |
| maxIter | int | No | Maximum iterations for NN-Descent (default 50 for Euclidean, 10 for generic). |
| delta | double | No | Early stopping threshold for NN-Descent (default 0.001). |
Outputs
| Name | Type | Description |
|---|---|---|
| NearestNeighborGraph | NearestNeighborGraph | The k-NN graph record with neighbor indices and distances. |
| neighbors() | int[][] | 2D array where neighbors[i] contains the k nearest neighbor indices of vertex i. |
| distances() | double[][] | 2D array where distances[i] contains the distances to the k nearest neighbors of vertex i. |
| graph(digraph) | AdjacencyList | Converts to an AdjacencyList graph representation. |
| largest(digraph) | NearestNeighborGraph | The largest connected component of the k-NN graph. |
Usage Examples
Basic Usage
import smile.graph.NearestNeighborGraph;
import smile.graph.AdjacencyList;
// Sample data
double[][] data = {
{0.0, 0.0}, {1.0, 0.0}, {0.0, 1.0},
{1.0, 1.0}, {2.0, 2.0}, {3.0, 3.0}
};
// Exact k-NN with Euclidean distance
NearestNeighborGraph knn = NearestNeighborGraph.of(data, 3);
// Access neighbor information
int[] neighbors0 = knn.neighbors()[0]; // Nearest neighbors of point 0
double[] dists0 = knn.distances()[0]; // Corresponding distances
// Convert to graph for further algorithms
AdjacencyList graph = knn.graph(false); // Undirected
Approximate NN-Descent
import smile.graph.NearestNeighborGraph;
// Large dataset - use approximate NN-Descent
double[][] largeData = new double[10000][100];
// ... populate data ...
// Fast approximate k-NN graph
NearestNeighborGraph knn = NearestNeighborGraph.descent(largeData, 15);
// Get largest connected component
NearestNeighborGraph lcc = knn.largest(false);