Implementation:Rapidsai Cuml KMeans DBSCAN HDBSCAN Init
Appearance
| Knowledge Sources | |
|---|---|
| Domains | Machine_Learning, Clustering |
| Last Updated | 2026-02-08 00:00 GMT |
Overview
Concrete tool for configuring KMeans, DBSCAN, and HDBSCAN GPU-accelerated clustering algorithms via their constructor parameters.
Description
These constructors initialize the three primary clustering estimators in cuML:
- KMeans.__init__ configures the number of clusters, initialization method (scalable-k-means++, random), convergence tolerance, and batch processing parameters.
- DBSCAN.__init__ configures the neighborhood distance epsilon, minimum samples for core points, distance metric, and memory batch budget.
- HDBSCAN.__init__ configures the minimum cluster size, cluster selection method (EOM vs leaf), KNN build algorithm, and prediction data generation.
Usage
Import and instantiate these classes to create clustering estimators. Configure hyperparameters based on the dataset size, expected cluster count, and noise characteristics.
Code Reference
KMeans.__init__
Source Location
- Repository: cuML
- File:
python/cuml/cuml/cluster/kmeans.pyx - Lines: 482-504
Signature
def __init__(
self,
*,
n_clusters=8,
max_iter=300,
tol=1e-4,
verbose=False,
random_state=None,
init='scalable-k-means++',
n_init='auto',
oversampling_factor=2.0,
max_samples_per_batch=32768,
output_type=None,
):
Import
from cuml import KMeans
# or
from cuml.cluster import KMeans
DBSCAN.__init__
Source Location
- Repository: cuML
- File:
python/cuml/cuml/cluster/dbscan.pyx - Lines: 281-299
Signature
def __init__(
self,
*,
eps=0.5,
min_samples=5,
metric='euclidean',
algorithm='brute',
verbose=False,
max_mbytes_per_batch=None,
output_type=None,
calc_core_sample_indices=True,
):
Import
from cuml import DBSCAN
# or
from cuml.cluster import DBSCAN
HDBSCAN.__init__
Source Location
- Repository: cuML
- File:
python/cuml/cuml/cluster/hdbscan/hdbscan.pyx - Lines: 802-836
Signature
def __init__(
self,
*,
min_cluster_size=5,
min_samples=None,
cluster_selection_epsilon=0.0,
max_cluster_size=0,
metric='euclidean',
alpha=1.0,
p=None,
cluster_selection_method='eom',
allow_single_cluster=False,
gen_min_span_tree=False,
verbose=False,
output_type=None,
prediction_data=False,
build_algo='brute_force',
build_kwds=None,
device_ids=None,
):
Import
from cuml import HDBSCAN
# or
from cuml.cluster import HDBSCAN
I/O Contract
KMeans Inputs
| Name | Type | Required | Description | |
|---|---|---|---|---|
| n_clusters | int | No (default 8) | Number of clusters to form. | |
| max_iter | int | No (default 300) | Maximum Lloyd iterations. | |
| tol | float | No (default 1e-4) | Convergence threshold on center shift. | |
| init | str | No (default 'scalable-k-means++') | Initialization: 'scalable-k-means++', 'k-means | ', 'k-means++', or 'random'. |
| n_init | int or str | No (default 'auto') | Number of initializations to run. | |
| oversampling_factor | float | No (default 2.0) | Factor for scalable k-means++ oversampling. | |
| max_samples_per_batch | int | No (default 32768) | Samples per distance computation batch. |
DBSCAN Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| eps | float | No (default 0.5) | Maximum neighborhood distance for core point calculation. |
| min_samples | int | No (default 5) | Minimum number of neighbors for a core sample. |
| metric | str | No (default 'euclidean') | Distance metric: 'euclidean', 'cosine', or 'precomputed'. |
| max_mbytes_per_batch | float or None | No (default None) | Memory budget in MB per batch for distance computation. None uses all available GPU memory. |
HDBSCAN Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| min_cluster_size | int | No (default 5) | Minimum number of samples in a cluster. |
| cluster_selection_method | str | No (default 'eom') | Cluster extraction: 'eom' (Excess of Mass) or 'leaf'. |
| build_algo | str | No (default 'brute_force') | KNN graph construction: 'brute_force' or 'nn_descent'. |
| prediction_data | bool | No (default False) | If True, caches data needed for approximate_predict. |
Outputs
| Name | Type | Description |
|---|---|---|
| KMeans instance | KMeans | Configured KMeans estimator ready for fitting. |
| DBSCAN instance | DBSCAN | Configured DBSCAN estimator ready for fitting. |
| HDBSCAN instance | HDBSCAN | Configured HDBSCAN estimator ready for fitting. |
Usage Examples
from cuml.cluster import KMeans, DBSCAN, HDBSCAN
# KMeans for known cluster count
kmeans = KMeans(n_clusters=5, max_iter=500, init='scalable-k-means++')
# DBSCAN for density-based discovery
dbscan = DBSCAN(eps=0.3, min_samples=10, metric='euclidean')
# HDBSCAN for variable-density clusters
hdbscan = HDBSCAN(min_cluster_size=15, cluster_selection_method='eom', prediction_data=True)
Related Pages
Implements Principle
Requires Environment
Page Connections
Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment