Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Rapidsai Cuml KNN API

From Leeroopedia


Knowledge Sources
Domains Machine_Learning, Nearest_Neighbors
Last Updated 2026-02-08 12:00 GMT

Overview

Provides the C++ API for GPU-accelerated k-nearest neighbors operations in cuML, including brute-force KNN search, approximate KNN index building and querying, as well as KNN-based classification, regression, and class probability estimation.

Description

The knn.hpp header declares a comprehensive set of functions for performing k-nearest neighbors operations on NVIDIA GPUs. The API is organized into several functional groups:

Brute-Force KNN:

  • brute_force_knn: Performs exact KNN search across multiple input arrays, combining results into unified output index and distance arrays. Supports configurable distance metrics, row/column-major layouts, and partition translation indices.

Random Ball Cover (RBC) Index:

  • rbc_build_index: Builds a Random Ball Cover spatial index for efficient approximate nearest neighbor queries.
  • rbc_knn_query: Queries the RBC index for nearest neighbors.
  • rbc_free_index: Frees the device memory associated with an RBC index.

Approximate KNN (IVF-based):

  • approx_knn_build_index: Builds an approximate KNN index using IVF-Flat or IVF-PQ parameters via FAISS.
  • approx_knn_search: Searches an approximate KNN index for nearest neighbors.
  • knnIndex, knnIndexParam, IVFParam, IVFFlatParam, IVFPQParam: Structs for configuring and storing index state.

KNN-based Supervised Learning:

  • knn_classify: Performs KNN classification using precomputed KNN indices and label arrays, supporting multilabel classification.
  • knn_regress: Performs KNN regression using precomputed KNN indices and target value arrays.
  • knn_class_proba: Computes class probabilities from precomputed KNN indices.

All functions operate on device memory and use the RAFT handle for GPU resource management.

Usage

Use this API for nearest-neighbor search tasks on GPU. Choose brute-force KNN for exact results on smaller datasets, approximate KNN (IVF-Flat/IVF-PQ) for large-scale approximate search, or the RBC index for moderate-scale approximate search. The classification, regression, and probability functions are used after a KNN query to perform supervised learning based on neighbor labels.

Code Reference

Source Location

  • Repository: Rapidsai_Cuml
  • File: cpp/include/cuml/neighbors/knn.hpp

Signature

namespace ML {

void brute_force_knn(const raft::handle_t& handle,
                     std::vector<float*>& input,
                     std::vector<int>& sizes,
                     int D,
                     float* search_items,
                     int n,
                     int64_t* res_I,
                     float* res_D,
                     int k,
                     bool rowMajorIndex = false,
                     bool rowMajorQuery = false,
                     ML::distance::DistanceType metric = ML::distance::DistanceType::L2Expanded,
                     float metric_arg = 2.0f,
                     std::vector<int64_t>* translations = nullptr);

void rbc_build_index(const raft::handle_t& handle,
                     std::uintptr_t& rbc_index,
                     float* X, int64_t n_rows, int64_t n_cols,
                     ML::distance::DistanceType metric);

void rbc_knn_query(const raft::handle_t& handle,
                   const std::uintptr_t& rbc_index,
                   uint32_t k, const float* search_items,
                   uint32_t n_search_items, int64_t dim,
                   int64_t* out_inds, float* out_dists);

void rbc_free_index(std::uintptr_t rbc_index);

struct knnIndex {
  knnIndex();
  ~knnIndex();
  ML::distance::DistanceType metric;
  float metricArg;
  int nprobe;
  int device;
  std::unique_ptr<knnIndexImpl> pimpl;
};

struct knnIndexParam { virtual ~knnIndexParam() {} };
struct IVFParam : knnIndexParam { int nlist; int nprobe; };
struct IVFFlatParam : IVFParam {};
struct IVFPQParam : IVFParam { int M; int n_bits; bool usePrecomputedTables; };

void approx_knn_build_index(raft::handle_t& handle, knnIndex* index,
                            knnIndexParam* params,
                            ML::distance::DistanceType metric, float metricArg,
                            float* index_array, int n, int D);

void approx_knn_search(raft::handle_t& handle, float* distances,
                       int64_t* indices, knnIndex* index, int k,
                       float* query_array, int n);

void knn_classify(raft::handle_t& handle, int* out, int64_t* knn_indices,
                  std::vector<int*>& y, size_t n_index_rows,
                  size_t n_query_rows, int k, float* sample_weight = nullptr);

void knn_regress(raft::handle_t& handle, float* out, int64_t* knn_indices,
                 std::vector<float*>& y, size_t n_index_rows,
                 size_t n_query_rows, int k, float* sample_weight = nullptr);

void knn_class_proba(raft::handle_t& handle, std::vector<float*>& out,
                     int64_t* knn_indices, std::vector<int*>& y,
                     size_t n_index_rows, size_t n_query_rows, int k,
                     float* sample_weight = nullptr);

} // namespace ML

Import

#include <cuml/neighbors/knn.hpp>

I/O Contract

Inputs

brute_force_knn

Name Type Required Description
handle const raft::handle_t& Yes RAFT handle for GPU resources
input std::vector<float*>& Yes Vector of device pointers to index arrays
sizes std::vector<int>& Yes Vector of row counts for each input array
D int Yes Dimensionality of the data
search_items float* Yes Device pointer to query array [n x D]
n int Yes Number of query rows
res_I int64_t* Yes Output device pointer for result indices [n x k]
res_D float* Yes Output device pointer for result distances [n x k]
k int Yes Number of nearest neighbors
rowMajorIndex bool No Whether index arrays are row-major (default: false)
rowMajorQuery bool No Whether query array is row-major (default: false)
metric ML::distance::DistanceType No Distance metric (default: L2Expanded)
metric_arg float No Metric argument for Minkowski distances (default: 2.0)
translations std::vector<int64_t>* No Translation IDs for non-contiguous partitions (default: nullptr)

knn_classify

Name Type Required Description
handle raft::handle_t& Yes RAFT handle
out int* Yes Output device array for predicted labels [n_query_rows]
knn_indices int64_t* Yes Device array of KNN indices [n_query_rows x k]
y std::vector<int*>& Yes Vector of label arrays on device, one per output
n_index_rows size_t Yes Number of rows in the index (size of each y array)
n_query_rows size_t Yes Number of query samples
k int Yes Number of nearest neighbors
sample_weight float* No Optional sample weights [n_query_rows x k] (default: nullptr)

Outputs

Name Type Description
res_I int64_t* KNN result index array of size [n x k]
res_D float* KNN result distance array of size [n x k]
out (classify) int* Predicted class labels
out (regress) float* Predicted regression values
out (class_proba) std::vector<float*>& Class probability arrays per output

Usage Examples

#include <cuml/neighbors/knn.hpp>

raft::handle_t handle;

// Brute force KNN search
std::vector<float*> index_arrays = {d_index_part1, d_index_part2};
std::vector<int> sizes = {5000, 3000};
int D = 128;
int n_queries = 100;
int k = 10;

int64_t* d_indices;  // pre-allocated [n_queries x k]
float* d_distances;  // pre-allocated [n_queries x k]

ML::brute_force_knn(handle, index_arrays, sizes, D,
                    d_query, n_queries, d_indices, d_distances, k);

// KNN classification using the results
std::vector<int*> labels = {d_labels};
int* d_predictions;  // pre-allocated [n_queries]

ML::knn_classify(handle, d_predictions, d_indices, labels,
                 8000, n_queries, k);

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment