Workflow:Rapidsai Cuml Random Forest Training And Inference

Knowledge Sources	cuML cuML API Docs FIL Documentation
Domains	Machine_Learning, Ensemble_Methods, Model_Inference, GPU_Computing
Last Updated	2026-02-08 12:00 GMT

Overview

End-to-end process for training GPU-accelerated Random Forest models with cuML and deploying them for high-throughput inference using the Forest Inference Library (FIL).

Description

This workflow covers the complete lifecycle of tree-based ensemble models in the RAPIDS ecosystem. It starts with training a Random Forest classifier or regressor on GPU using cuML's implementation, then transitions to the Forest Inference Library (FIL) for optimized production inference. FIL is a specialized inference engine that can load models from cuML, scikit-learn, XGBoost, and LightGBM via the Treelite interchange format. It supports both GPU and CPU inference, automatic performance optimization, and advanced features like per-tree predictions and leaf node extraction.

Usage

Execute this workflow when you need to train a tree-based ensemble model and deploy it for fast inference. This is appropriate for classification or regression tasks on tabular data where model interpretability and prediction speed are priorities. The FIL inference engine is particularly valuable when serving predictions at scale, as it provides 80x+ speedups over scikit-learn inference and supports seamless GPU/CPU switching.

Execution Steps

Step 1: Data Preparation

Prepare training and test datasets. Load data into GPU memory using cuDF DataFrames or CuPy arrays. Split into features (X) and target (y). Apply any necessary preprocessing such as encoding categorical variables, handling missing values, and splitting into train/test sets using cuML's `train_test_split`.

Key considerations:

Random Forest handles both classification and regression tasks
cuML RF requires numeric input features
Use `cuml.model_selection.train_test_split` for GPU-native data splitting

Step 2: Random Forest Training

Initialize a `RandomForestClassifier` or `RandomForestRegressor` with appropriate hyperparameters. Key parameters include the number of trees, maximum depth, number of quantile bins for split finding, and the split criterion. Call `fit()` with the training data.

Key considerations:

`n_estimators` controls the number of trees (default: 100)
`max_depth` limits tree depth (default: 16, unlike scikit-learn's unlimited default)
`n_bins` controls the quantile-based split algorithm precision (default: 128)
`split_criterion` accepts 'gini' or 'entropy' for classification, 'mse' or 'mae' for regression
`bootstrap` enables bagging (default: True)

Step 3: Model Evaluation

Evaluate the trained model on the test set using `predict()` for class labels or `predict_proba()` for probabilities. Compute accuracy, F1 score, or other metrics using cuML's metrics module. Validate that the model meets performance requirements before proceeding to inference deployment.

Key considerations:

cuML RF supports `predict()`, `predict_proba()`, and `score()` methods
Use `cuml.metrics.accuracy_score` or `cuml.metrics.r2_score` for evaluation
Models can be compared against scikit-learn equivalents for validation

Step 4: Model Export

Save the trained model for later use. cuML Random Forest models can be serialized using Python pickle or converted to Treelite format for cross-framework compatibility. The Treelite format enables loading into FIL and sharing models between cuML, XGBoost, and LightGBM ecosystems.

Key considerations:

Pickle serialization preserves the complete cuML model state
Treelite conversion enables framework-agnostic model interchange
XGBoost UBJ and JSON formats are widely supported by FIL

Step 5: FIL Model Loading

Load the trained model into the Forest Inference Library for optimized inference. FIL can load models from files (XGBoost, LightGBM formats), from scikit-learn objects, or from Treelite model objects. Specify whether the model is a classifier or regressor and choose the precision level.

Supported loading methods:

`ForestInference.load()` for file-based loading (auto-detects format from extension)
`ForestInference.load_from_sklearn()` for scikit-learn Random Forest models
`ForestInference.load_from_treelite_model()` for Treelite model objects

Step 6: Inference Optimization

Run FIL's automatic optimizer to find the best memory layout and chunk size for the target batch size. The optimizer tests different tree layouts (depth-first, breadth-first, layered) and chunk sizes, measuring throughput on synthetic data to find the optimal configuration.

Key considerations:

Call `optimize(batch_size=N)` with the expected inference batch size
The optimizer adjusts `layout` and `default_chunk_size` parameters
Depth-first layout is the default; breadth-first may be better for shallow trees

Step 7: Production Inference

Run predictions using the optimized FIL model. FIL supports standard prediction, probability prediction, per-tree predictions, and leaf node extraction. GPU/CPU inference can be switched at runtime using a context manager.

What happens:

`predict()` returns class labels or regression values
`predict_proba()` returns class probabilities for classifiers
`predict_per_tree()` returns individual tree outputs for custom ensembling
`apply()` returns leaf node IDs for similarity analysis
Use `set_fil_device_type("cpu")` context manager for CPU inference

Execution Diagram

GitHub URL

Workflow Repository