Implementation:Rapidsai Cuml Genetic Common
| Knowledge Sources | |
|---|---|
| Domains | Machine_Learning, Genetic_Programming |
| Last Updated | 2026-02-08 12:00 GMT |
Overview
Defines the common enumerations and hyperparameter structure for the cuML symbolic regression/classification/transformation genetic programming module.
Description
The common.h header in the cuml::genetic namespace provides the foundational types and configuration used across the genetic programming API:
Enumerations:
metric_t: Fitness metric types including MAE, MSE, RMSE (regression), Pearson and Spearman correlation (regression/transformation), and log-loss (classification).init_method_t: Population initialization methods:grow(random depth),full(grow to random depth), andhalf_and_half(50/50 mix).transformer_t: Transformation functions for class probability conversion (currentlysigmoid).mutation_t: Mutation types:none,crossover,subtree,hoist,point, andreproduce.
param Struct:
A comprehensive hyperparameter struct controlling all aspects of genetic programming evolution:
- Population control:
population_size,generations,tournament_size. - Feature engineering:
hall_of_fame,n_components(transformation-only). - Tree structure:
init_depthrange,init_method,terminalRatio. - Node types:
function_set(available operations),arity_set(functions by arity),const_range. - Mutation probabilities:
p_crossover,p_subtree_mutation,p_hoist_mutation,p_point_mutation,p_point_replace. - Training control:
metric,stopping_criteria,parsimony_coefficient,max_samples,random_state. - Utility methods:
p_reproduce(),max_programs(),criterion().
Usage
Use these types when configuring the cuML symbolic regression (symFit), classification (symClfPredict), or transformation (symTransform) genetic programming algorithms. The param struct is the primary configuration interface for controlling evolution behavior, complexity penalties, and convergence criteria.
Code Reference
Source Location
- Repository: Rapidsai_Cuml
- File:
cpp/include/cuml/genetic/common.h
Signature
namespace cuml {
namespace genetic {
enum class metric_t : uint32_t { mae, mse, rmse, pearson, spearman, logloss };
enum class init_method_t : uint32_t { grow, full, half_and_half };
enum class transformer_t : uint32_t { sigmoid };
enum class mutation_t : uint32_t { none, crossover, subtree, hoist, point, reproduce };
struct param {
int population_size = 1000;
int hall_of_fame = 100;
int n_components = 10;
int generations = 20;
int tournament_size = 20;
float stopping_criteria = 0.0f;
float const_range[2] = {-1.0f, 1.0f};
int init_depth[2] = {2, 6};
init_method_t init_method = init_method_t::half_and_half;
std::vector<node::type> function_set;
std::map<int, std::vector<node::type>> arity_set;
transformer_t transformer = transformer_t::sigmoid;
metric_t metric = metric_t::mae;
float parsimony_coefficient = 0.001f;
float p_crossover = 0.9f;
float p_subtree_mutation = 0.01f;
float p_hoist_mutation = 0.01f;
float p_point_mutation = 0.01f;
float p_point_replace = 0.05f;
float max_samples = 1.0f;
float terminalRatio = 0.0f;
std::vector<std::string> feature_names;
int num_features;
uint64_t random_state = 0UL;
int num_epochs = 0;
bool low_memory = false;
float p_reproduce() const;
int max_programs() const;
int criterion() const;
};
} // namespace genetic
} // namespace cuml
Import
#include <cuml/genetic/common.h>
I/O Contract
Inputs
param struct (key fields)
| Name | Type | Required | Description |
|---|---|---|---|
| population_size | int | No | Number of programs per generation (default: 1000) |
| generations | int | No | Number of generations to evolve (default: 20) |
| tournament_size | int | No | Tournament selection size (default: 20) |
| metric | metric_t | No | Fitness metric (default: mae) |
| init_method | init_method_t | No | Population initialization method (default: half_and_half) |
| init_depth | int[2] | No | Min/max initial tree depth (default: [2, 6]) |
| p_crossover | float | No | Crossover mutation probability (default: 0.9) |
| parsimony_coefficient | float | No | Penalty for large programs (default: 0.001) |
| num_features | int | Yes | Number of features in the dataset |
| random_state | uint64_t | No | Random seed (default: 0) |
Outputs
| Name | Type | Description |
|---|---|---|
| p_reproduce() | float | Computed reproduction probability (1 - sum of other mutation probabilities) |
| max_programs() | int | Maximum possible number of programs across all generations |
| criterion() | int | Scoring criterion direction based on the selected metric |
Usage Examples
#include <cuml/genetic/common.h>
// Configure genetic programming parameters
cuml::genetic::param params;
params.population_size = 500;
params.generations = 50;
params.tournament_size = 10;
params.metric = cuml::genetic::metric_t::mse;
params.init_method = cuml::genetic::init_method_t::half_and_half;
params.init_depth[0] = 2;
params.init_depth[1] = 8;
params.p_crossover = 0.85f;
params.p_subtree_mutation = 0.05f;
params.p_hoist_mutation = 0.05f;
params.p_point_mutation = 0.05f;
params.parsimony_coefficient = 0.001f;
params.num_features = 10;
params.random_state = 42;
// Check reproduction probability
float p_repro = params.p_reproduce();
// p_repro = 1.0 - 0.85 - 0.05 - 0.05 - 0.05 = 0.0