Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Rapidsai Cuml Genetic Common

From Leeroopedia


Knowledge Sources
Domains Machine_Learning, Genetic_Programming
Last Updated 2026-02-08 12:00 GMT

Overview

Defines the common enumerations and hyperparameter structure for the cuML symbolic regression/classification/transformation genetic programming module.

Description

The common.h header in the cuml::genetic namespace provides the foundational types and configuration used across the genetic programming API:

Enumerations:

  • metric_t: Fitness metric types including MAE, MSE, RMSE (regression), Pearson and Spearman correlation (regression/transformation), and log-loss (classification).
  • init_method_t: Population initialization methods: grow (random depth), full (grow to random depth), and half_and_half (50/50 mix).
  • transformer_t: Transformation functions for class probability conversion (currently sigmoid).
  • mutation_t: Mutation types: none, crossover, subtree, hoist, point, and reproduce.

param Struct: A comprehensive hyperparameter struct controlling all aspects of genetic programming evolution:

  • Population control: population_size, generations, tournament_size.
  • Feature engineering: hall_of_fame, n_components (transformation-only).
  • Tree structure: init_depth range, init_method, terminalRatio.
  • Node types: function_set (available operations), arity_set (functions by arity), const_range.
  • Mutation probabilities: p_crossover, p_subtree_mutation, p_hoist_mutation, p_point_mutation, p_point_replace.
  • Training control: metric, stopping_criteria, parsimony_coefficient, max_samples, random_state.
  • Utility methods: p_reproduce(), max_programs(), criterion().

Usage

Use these types when configuring the cuML symbolic regression (symFit), classification (symClfPredict), or transformation (symTransform) genetic programming algorithms. The param struct is the primary configuration interface for controlling evolution behavior, complexity penalties, and convergence criteria.

Code Reference

Source Location

  • Repository: Rapidsai_Cuml
  • File: cpp/include/cuml/genetic/common.h

Signature

namespace cuml {
namespace genetic {

enum class metric_t : uint32_t { mae, mse, rmse, pearson, spearman, logloss };

enum class init_method_t : uint32_t { grow, full, half_and_half };

enum class transformer_t : uint32_t { sigmoid };

enum class mutation_t : uint32_t { none, crossover, subtree, hoist, point, reproduce };

struct param {
  int population_size = 1000;
  int hall_of_fame = 100;
  int n_components = 10;
  int generations = 20;
  int tournament_size = 20;
  float stopping_criteria = 0.0f;
  float const_range[2] = {-1.0f, 1.0f};
  int init_depth[2] = {2, 6};
  init_method_t init_method = init_method_t::half_and_half;
  std::vector<node::type> function_set;
  std::map<int, std::vector<node::type>> arity_set;
  transformer_t transformer = transformer_t::sigmoid;
  metric_t metric = metric_t::mae;
  float parsimony_coefficient = 0.001f;
  float p_crossover = 0.9f;
  float p_subtree_mutation = 0.01f;
  float p_hoist_mutation = 0.01f;
  float p_point_mutation = 0.01f;
  float p_point_replace = 0.05f;
  float max_samples = 1.0f;
  float terminalRatio = 0.0f;
  std::vector<std::string> feature_names;
  int num_features;
  uint64_t random_state = 0UL;
  int num_epochs = 0;
  bool low_memory = false;

  float p_reproduce() const;
  int max_programs() const;
  int criterion() const;
};

} // namespace genetic
} // namespace cuml

Import

#include <cuml/genetic/common.h>

I/O Contract

Inputs

param struct (key fields)

Name Type Required Description
population_size int No Number of programs per generation (default: 1000)
generations int No Number of generations to evolve (default: 20)
tournament_size int No Tournament selection size (default: 20)
metric metric_t No Fitness metric (default: mae)
init_method init_method_t No Population initialization method (default: half_and_half)
init_depth int[2] No Min/max initial tree depth (default: [2, 6])
p_crossover float No Crossover mutation probability (default: 0.9)
parsimony_coefficient float No Penalty for large programs (default: 0.001)
num_features int Yes Number of features in the dataset
random_state uint64_t No Random seed (default: 0)

Outputs

Name Type Description
p_reproduce() float Computed reproduction probability (1 - sum of other mutation probabilities)
max_programs() int Maximum possible number of programs across all generations
criterion() int Scoring criterion direction based on the selected metric

Usage Examples

#include <cuml/genetic/common.h>

// Configure genetic programming parameters
cuml::genetic::param params;
params.population_size = 500;
params.generations = 50;
params.tournament_size = 10;
params.metric = cuml::genetic::metric_t::mse;
params.init_method = cuml::genetic::init_method_t::half_and_half;
params.init_depth[0] = 2;
params.init_depth[1] = 8;
params.p_crossover = 0.85f;
params.p_subtree_mutation = 0.05f;
params.p_hoist_mutation = 0.05f;
params.p_point_mutation = 0.05f;
params.parsimony_coefficient = 0.001f;
params.num_features = 10;
params.random_state = 42;

// Check reproduction probability
float p_repro = params.p_reproduce();
// p_repro = 1.0 - 0.85 - 0.05 - 0.05 - 0.05 = 0.0

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment