Workflow:Rapidsai Cuml Random Forest Training And Inference
| Knowledge Sources | |
|---|---|
| Domains | Machine_Learning, Ensemble_Methods, Model_Inference, GPU_Computing |
| Last Updated | 2026-02-08 12:00 GMT |
Overview
End-to-end process for training GPU-accelerated Random Forest models with cuML and deploying them for high-throughput inference using the Forest Inference Library (FIL).
Description
This workflow covers the complete lifecycle of tree-based ensemble models in the RAPIDS ecosystem. It starts with training a Random Forest classifier or regressor on GPU using cuML's implementation, then transitions to the Forest Inference Library (FIL) for optimized production inference. FIL is a specialized inference engine that can load models from cuML, scikit-learn, XGBoost, and LightGBM via the Treelite interchange format. It supports both GPU and CPU inference, automatic performance optimization, and advanced features like per-tree predictions and leaf node extraction.
Usage
Execute this workflow when you need to train a tree-based ensemble model and deploy it for fast inference. This is appropriate for classification or regression tasks on tabular data where model interpretability and prediction speed are priorities. The FIL inference engine is particularly valuable when serving predictions at scale, as it provides 80x+ speedups over scikit-learn inference and supports seamless GPU/CPU switching.
Execution Steps
Step 1: Data Preparation
Prepare training and test datasets. Load data into GPU memory using cuDF DataFrames or CuPy arrays. Split into features (X) and target (y). Apply any necessary preprocessing such as encoding categorical variables, handling missing values, and splitting into train/test sets using cuML's `train_test_split`.
Key considerations:
- Random Forest handles both classification and regression tasks
- cuML RF requires numeric input features
- Use `cuml.model_selection.train_test_split` for GPU-native data splitting
Step 2: Random Forest Training
Initialize a `RandomForestClassifier` or `RandomForestRegressor` with appropriate hyperparameters. Key parameters include the number of trees, maximum depth, number of quantile bins for split finding, and the split criterion. Call `fit()` with the training data.
Key considerations:
- `n_estimators` controls the number of trees (default: 100)
- `max_depth` limits tree depth (default: 16, unlike scikit-learn's unlimited default)
- `n_bins` controls the quantile-based split algorithm precision (default: 128)
- `split_criterion` accepts 'gini' or 'entropy' for classification, 'mse' or 'mae' for regression
- `bootstrap` enables bagging (default: True)
Step 3: Model Evaluation
Evaluate the trained model on the test set using `predict()` for class labels or `predict_proba()` for probabilities. Compute accuracy, F1 score, or other metrics using cuML's metrics module. Validate that the model meets performance requirements before proceeding to inference deployment.
Key considerations:
- cuML RF supports `predict()`, `predict_proba()`, and `score()` methods
- Use `cuml.metrics.accuracy_score` or `cuml.metrics.r2_score` for evaluation
- Models can be compared against scikit-learn equivalents for validation
Step 4: Model Export
Save the trained model for later use. cuML Random Forest models can be serialized using Python pickle or converted to Treelite format for cross-framework compatibility. The Treelite format enables loading into FIL and sharing models between cuML, XGBoost, and LightGBM ecosystems.
Key considerations:
- Pickle serialization preserves the complete cuML model state
- Treelite conversion enables framework-agnostic model interchange
- XGBoost UBJ and JSON formats are widely supported by FIL
Step 5: FIL Model Loading
Load the trained model into the Forest Inference Library for optimized inference. FIL can load models from files (XGBoost, LightGBM formats), from scikit-learn objects, or from Treelite model objects. Specify whether the model is a classifier or regressor and choose the precision level.
Supported loading methods:
- `ForestInference.load()` for file-based loading (auto-detects format from extension)
- `ForestInference.load_from_sklearn()` for scikit-learn Random Forest models
- `ForestInference.load_from_treelite_model()` for Treelite model objects
Step 6: Inference Optimization
Run FIL's automatic optimizer to find the best memory layout and chunk size for the target batch size. The optimizer tests different tree layouts (depth-first, breadth-first, layered) and chunk sizes, measuring throughput on synthetic data to find the optimal configuration.
Key considerations:
- Call `optimize(batch_size=N)` with the expected inference batch size
- The optimizer adjusts `layout` and `default_chunk_size` parameters
- Depth-first layout is the default; breadth-first may be better for shallow trees
Step 7: Production Inference
Run predictions using the optimized FIL model. FIL supports standard prediction, probability prediction, per-tree predictions, and leaf node extraction. GPU/CPU inference can be switched at runtime using a context manager.
What happens:
- `predict()` returns class labels or regression values
- `predict_proba()` returns class probabilities for classifiers
- `predict_per_tree()` returns individual tree outputs for custom ensembling
- `apply()` returns leaf node IDs for similarity analysis
- Use `set_fil_device_type("cpu")` context manager for CPU inference