Implementation:Rapidsai Cuml Input To Cuml Array
| Knowledge Sources | |
|---|---|
| Domains | Machine_Learning, Data_Engineering |
| Last Updated | 2026-02-08 00:00 GMT |
Overview
Concrete tool for converting heterogeneous input data formats to GPU-resident CumlArray objects suitable for cuML clustering and other algorithms.
Description
The `input_to_cuml_array` function is the universal data conversion gateway in cuML. It accepts data in any supported format (cuDF DataFrame/Series, pandas DataFrame/Series, NumPy ndarray, CuPy ndarray, Numba CUDA device array, scipy/cupyx sparse matrices) and converts it to a GPU-resident `CumlArray` with validated dtype, memory layout, and contiguity.
This function is called internally by all cuML estimators before fitting or prediction to ensure data is in the correct format on the GPU device.
Usage
Called internally by cuML estimators during `fit()`, `predict()`, and `transform()`. Can also be used directly for manual data preparation before passing to multiple estimators.
Code Reference
input_to_cuml_array
Source Location
- Repository: cuML
- File:
python/cuml/cuml/internals/input_utils.py - Lines: 265-374
Signature
def input_to_cuml_array(
X,
order="F",
deepcopy=False,
check_dtype=False,
convert_to_dtype=False,
check_mem_type=False,
convert_to_mem_type="device",
safe_dtype_conversion=True,
check_cols=False,
check_rows=False,
fail_on_order=False,
force_contiguous=True,
):
Import
from cuml.internals.input_utils import input_to_cuml_array
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| X | array-like | Yes | Input data: cuDF DataFrame/Series, pandas DataFrame/Series, NumPy ndarray, CuPy ndarray, Numba CUDA device array, scipy/cupyx sparse matrix. |
| order | str | No (default 'F') | Memory layout: 'F' (column-major, Fortran-style) or 'C' (row-major, C-style) or 'K' (keep existing). |
| deepcopy | bool | No (default False) | If True, always copy data. If False, copy only when necessary for conversion. |
| convert_to_dtype | dtype or False | No (default False) | Target dtype for conversion (e.g., np.float32). False means no conversion. |
| convert_to_mem_type | str | No (default 'device') | Target memory type: 'device' (GPU) or 'host' (CPU). |
| force_contiguous | bool | No (default True) | Ensure output array is contiguous in memory. |
Outputs
| Name | Type | Description |
|---|---|---|
| result | namedtuple | `cuml_array(array: CumlArray, n_rows: int, n_cols: int, dtype: np.dtype)` with data on GPU device memory. |
Usage Examples
import numpy as np
from cuml.internals.input_utils import input_to_cuml_array
# Convert NumPy array to GPU CumlArray
X_np = np.random.rand(1000, 10).astype(np.float32)
result = input_to_cuml_array(X_np, order='C', convert_to_mem_type='device')
gpu_array = result.array # CumlArray on GPU
n_rows = result.n_rows # 1000
n_cols = result.n_cols # 10