Workflow:Rapidsai Cuml Sklearn Zero Code Acceleration

Knowledge Sources	cuML cuML Accelerator Docs scikit-learn
Domains	Machine_Learning, GPU_Computing, Acceleration, MLOps
Last Updated	2026-02-08 12:00 GMT

Overview

End-to-end process for GPU-accelerating existing scikit-learn, UMAP, and HDBSCAN code without any code changes using cuML's zero-code-change accelerator module.

Description

This workflow covers the cuml.accel system that transparently intercepts Python imports of scikit-learn, umap-learn, and hdbscan, replacing them with GPU-accelerated cuML equivalents. The accelerator uses Python import hooking to create proxy objects that wrap sklearn estimators. These proxies automatically dispatch method calls to GPU when supported and gracefully fall back to CPU when GPU acceleration is not available for a particular parameter combination or method. The system supports four activation methods: command-line interface, Jupyter magic commands, environment variables, and programmatic installation.

Usage

Execute this workflow when you have existing Python machine learning code that uses scikit-learn, umap-learn, or hdbscan and want to GPU-accelerate it without modifying the source code. This is ideal for teams migrating existing ML pipelines to GPU, running sklearn-based libraries like BERTopic on GPU, or benchmarking GPU vs CPU performance. The zero-code-change approach means the same codebase works on both GPU and CPU-only environments.

Execution Steps

Step 1: Environment Setup

Ensure RAPIDS cuML is installed in the Python environment alongside scikit-learn. The cuml.accel module requires cuML to be present and will intercept imports of sklearn, umap, and hdbscan packages. No changes to existing package installations are needed; cuML supplements rather than replaces the original libraries.

Key considerations:

cuML must be installed alongside scikit-learn (not as a replacement)
CUDA-capable GPU and NVIDIA drivers are required for GPU execution
CPU fallback ensures code works even when GPU acceleration is not possible

Step 2: Accelerator Activation

Enable the accelerator using one of four supported methods, chosen based on the execution context.

CLI method: Run existing scripts with `python -m cuml.accel script.py` for the simplest approach. Supports flags like `-v` for verbose logging and `--profile` for performance profiling.

Jupyter magic: Add `%load_ext cuml.accel` at the top of a notebook cell before any sklearn imports. All subsequent cells run with GPU acceleration enabled.

Environment variable: Set `CUML_ACCEL_ENABLED=1` before running the Python process. Useful for container deployments and CI pipelines.

Programmatic: Call `cuml.accel.install()` early in the script, before importing sklearn, umap, or hdbscan. Provides fine-grained control with parameters like `disable_uvm` and `log_level`.

Step 3: Code Execution

Run the existing machine learning code unchanged. When sklearn classes are imported, the accelerator intercepts the import and returns proxy classes that transparently wrap cuML GPU implementations. Method calls like `fit()`, `predict()`, `transform()` are dispatched to GPU when supported.

What happens:

Import interception replaces sklearn classes with proxy objects
Each proxy maintains both a CPU (sklearn) and GPU (cuML) estimator instance
Hyperparameters are validated via the CPU estimator for compatibility
If all parameters are GPU-supported, execution happens on GPU
If any parameter is unsupported, automatic CPU fallback occurs with logging

Step 4: Performance Monitoring

Use the accelerator's built-in profiling and logging to understand where GPU acceleration is applied and measure speedups. Enable verbose mode to see which estimators run on GPU vs CPU. Use the profiler to measure execution times for individual method calls.

Key considerations:

Verbose mode (`-v`, `-vv`) logs GPU/CPU dispatch decisions
`--profile` flag provides execution time breakdowns per estimator method
`--line-profile` enables line-by-line profiling
`cuml.accel.is_proxy(obj)` can check if an object is GPU-accelerated

Step 5: Result Validation

Verify that GPU-accelerated results match expected outputs. cuML implementations aim for numerical compatibility with scikit-learn but may have minor floating-point differences due to GPU arithmetic. Compare predictions, scores, and model attributes between accelerated and non-accelerated runs for validation.

Key considerations:

Minor numerical differences are expected between GPU and CPU results
Some sklearn attributes may not be available in cuML equivalents (e.g., `rank_`, `singular_`)
Serialized models (pickle) work in CPU-only environments when the accelerator is not active
The system logs warnings when falling back to CPU for specific operations

Execution Diagram

GitHub URL

Workflow Repository