Implementation:Rapidsai Cuml DecisionTree Params
| Knowledge Sources | |
|---|---|
| Domains | Machine_Learning, Decision_Trees |
| Last Updated | 2026-02-08 12:00 GMT |
Overview
Defines the parameter structures, metadata types, and utility functions for GPU-accelerated decision tree models in cuML, including tree configuration, metadata nodes, and serialization helpers.
Description
The decisiontree.hpp header provides the core type definitions for decision trees in the ML::DT namespace:
DecisionTreeParams: A struct holding all hyperparameters for decision tree construction:
max_depth: Maximum tree depth (-1 for unlimited).max_leaves: Maximum leaf nodes (-1 for unlimited, soft constraint).max_features: Ratio of features to consider per split.max_n_bins: Maximum histogram bins for splits.min_samples_leaf: Minimum samples required in a leaf node.min_samples_split: Minimum samples required to split an internal node.split_criterion: Split quality metric (GINI, Entropy for classification; MSE for regression).min_impurity_decrease: Minimum impurity reduction required for a split.max_batch_size: Maximum nodes processed in a batch (for the batched-level algorithm).
set_tree_params: A convenience function to set all DecisionTreeParams members with sensible defaults.
TreeMetaDataNode: A templated struct storing complete tree metadata including tree ID, depth counter, leaf counter, training time, leaf values vector, sparse tree representation, and the number of outputs.
Type Aliases: Convenience typedefs for common tree types:
TreeClassifierF/TreeClassifierD: Float/double classification trees.TreeRegressorF/TreeRegressorD: Float/double regression trees.
Serialization Functions:
get_tree_summary_text: Returns a high-level summary string.get_tree_text: Returns a detailed text representation.get_tree_json: Returns the tree structure as a JSON string.
Usage
Use these types when configuring and working with cuML decision tree classifiers and regressors, including those used as base estimators in Random Forest. The DecisionTreeParams struct controls the tree-building process, while TreeMetaDataNode stores the trained tree for inference and inspection.
Code Reference
Source Location
- Repository: Rapidsai_Cuml
- File:
cpp/include/cuml/tree/decisiontree.hpp
Signature
namespace ML {
namespace DT {
struct DecisionTreeParams {
int max_depth;
int max_leaves;
float max_features;
int max_n_bins;
int min_samples_leaf;
int min_samples_split;
CRITERION split_criterion;
float min_impurity_decrease = 0.0f;
int max_batch_size;
};
void set_tree_params(DecisionTreeParams& params,
int cfg_max_depth = -1,
int cfg_max_leaves = -1,
float cfg_max_features = 1.0f,
int cfg_max_n_bins = 128,
int cfg_min_samples_leaf = 1,
int cfg_min_samples_split = 2,
float cfg_min_impurity_decrease = 0.0f,
CRITERION cfg_split_criterion = CRITERION_END,
int cfg_max_batch_size = 4096);
template <class T, class L>
struct TreeMetaDataNode {
int treeid;
int depth_counter;
int leaf_counter;
double train_time;
std::vector<T> vector_leaf;
std::vector<SparseTreeNode<T, L>> sparsetree;
int num_outputs;
};
template <class T, class L>
std::string get_tree_summary_text(const TreeMetaDataNode<T, L>* tree);
template <class T, class L>
std::string get_tree_text(const TreeMetaDataNode<T, L>* tree);
template <class T, class L>
std::string get_tree_json(const TreeMetaDataNode<T, L>* tree);
typedef TreeMetaDataNode<float, int> TreeClassifierF;
typedef TreeMetaDataNode<double, int> TreeClassifierD;
typedef TreeMetaDataNode<float, float> TreeRegressorF;
typedef TreeMetaDataNode<double, double> TreeRegressorD;
} // namespace DT
} // namespace ML
Import
#include <cuml/tree/decisiontree.hpp>
I/O Contract
Inputs
set_tree_params
| Name | Type | Required | Description |
|---|---|---|---|
| params | DecisionTreeParams& | Yes | Struct to be populated with tree parameters |
| cfg_max_depth | int | No | Maximum tree depth (default: -1, unlimited) |
| cfg_max_leaves | int | No | Maximum leaf nodes (default: -1, unlimited) |
| cfg_max_features | float | No | Fraction of features to consider per split (default: 1.0) |
| cfg_max_n_bins | int | No | Maximum histogram bins (default: 128) |
| cfg_min_samples_leaf | int | No | Minimum samples in leaf (default: 1) |
| cfg_min_samples_split | int | No | Minimum samples to split (default: 2) |
| cfg_min_impurity_decrease | float | No | Minimum impurity decrease for split (default: 0.0) |
| cfg_split_criterion | CRITERION | No | Split criterion (default: CRITERION_END, auto-select) |
| cfg_max_batch_size | int | No | Maximum batch size for batched algorithm (default: 4096) |
Outputs
| Name | Type | Description |
|---|---|---|
| params (set_tree_params) | DecisionTreeParams& | Populated parameter struct |
| get_tree_summary_text | std::string | High-level summary of the trained tree |
| get_tree_text | std::string | Detailed text representation of the tree |
| get_tree_json | std::string | JSON representation of the tree structure |
Usage Examples
#include <cuml/tree/decisiontree.hpp>
// Configure decision tree parameters
ML::DT::DecisionTreeParams params;
ML::DT::set_tree_params(params,
10, // max_depth
-1, // max_leaves (unlimited)
1.0f, // max_features
128, // max_n_bins
1, // min_samples_leaf
2, // min_samples_split
0.0f, // min_impurity_decrease
CRITERION_END, // auto-select criterion
4096); // max_batch_size
// After training, inspect the tree
ML::DT::TreeClassifierF* tree; // assume trained
std::string summary = ML::DT::get_tree_summary_text(tree);
std::string json = ML::DT::get_tree_json(tree);