Implementation:Rapidsai Cuml Genetic API
| Knowledge Sources | |
|---|---|
| Domains | Machine_Learning, Genetic_Programming |
| Last Updated | 2026-02-08 12:00 GMT |
Overview
Provides the GPU-accelerated genetic programming API in cuML for symbolic regression, classification, and transformation, with functions for fitting, predicting, and transforming data using evolved program trees.
Description
The genetic.h header declares the high-level genetic programming API in the cuml::genetic namespace. It provides functions for the complete lifecycle of symbolic machine learning:
Utility:
stringify: Converts a program (AST) to a human-readable string representation for visualization and debugging.
Training:
symFit: Fits a symbolic model (regressor, classifier, or transformer) to a given dataset. Evolves a population of programs over multiple generations using tournament selection, crossover, and mutation. Outputs the final generation of programs (sorted by fitness) and optionally the full generational history. Note: device memory allocated for program nodes must be freed by the caller after prediction.
Prediction:
symRegPredict: Makes continuous predictions using a trained symbolic regressor.symClfPredictProbs: Computes class probabilities for a symbolic classifier, optionally applying a transformer (e.g., sigmoid).symClfPredict: Returns binary class predictions from a symbolic classifier's decision boundary.
Transformation:
symTransform: Transforms input features using a set of evolved programs, generating new engineered features.
All functions operate on device memory and accept a RAFT handle for GPU resource management.
Usage
Use the genetic programming API for automatic feature engineering (transformation), interpretable regression (symbolic regression), or interpretable classification (symbolic classification). The evolved programs are human-readable mathematical expressions, making them suitable when model interpretability is important. The GPU acceleration enables practical evolution of large populations.
Code Reference
Source Location
- Repository: Rapidsai_Cuml
- File:
cpp/include/cuml/genetic/genetic.h
Signature
namespace cuml {
namespace genetic {
std::string stringify(const program& prog);
void symFit(const raft::handle_t& handle,
const float* input,
const float* labels,
const float* sample_weights,
const int n_rows,
const int n_cols,
param& params,
program_t& final_progs,
std::vector<std::vector<program>>& history);
void symRegPredict(const raft::handle_t& handle,
const float* input,
const int n_rows,
const program_t& best_prog,
float* output);
void symClfPredictProbs(const raft::handle_t& handle,
const float* input,
const int n_rows,
const param& params,
const program_t& best_prog,
float* output);
void symClfPredict(const raft::handle_t& handle,
const float* input,
const int n_rows,
const param& params,
const program_t& best_prog,
float* output);
void symTransform(const raft::handle_t& handle,
const float* input,
const param& params,
const program_t& final_progs,
const int n_rows,
const int n_cols,
float* output);
} // namespace genetic
} // namespace cuml
Import
#include <cuml/genetic/genetic.h>
I/O Contract
Inputs
symFit
| Name | Type | Required | Description |
|---|---|---|---|
| handle | const raft::handle_t& | Yes | cuML handle for GPU resources |
| input | const float* | Yes | Device pointer to feature matrix [n_rows x n_cols] |
| labels | const float* | Yes | Device pointer to labels [n_rows] |
| sample_weights | const float* | No | Device pointer to sample weights [n_rows], or nullptr |
| n_rows | int | Yes | Number of training samples |
| n_cols | int | Yes | Number of features |
| params | param& | Yes | Hyperparameters for evolution (population size, generations, etc.) |
symRegPredict
| Name | Type | Required | Description |
|---|---|---|---|
| handle | const raft::handle_t& | Yes | cuML handle |
| input | const float* | Yes | Device pointer to feature matrix [n_rows x n_cols] |
| n_rows | int | Yes | Number of samples |
| best_prog | const program_t& | Yes | Device pointer to the best trained program |
symTransform
| Name | Type | Required | Description |
|---|---|---|---|
| handle | const raft::handle_t& | Yes | cuML handle |
| input | const float* | Yes | Device pointer to feature matrix [n_rows x n_cols] |
| params | const param& | Yes | Training hyperparameters |
| final_progs | const program_t& | Yes | Device pointer to the evolved programs |
| n_rows | int | Yes | Number of samples |
| n_cols | int | Yes | Number of input features |
Outputs
| Name | Type | Description |
|---|---|---|
| final_progs (fit) | program_t& | Device pointer to the final generation of programs, sorted by decreasing fitness |
| history (fit) | std::vector<std::vector<program>>& | Host vector of all programs across all generations |
| output (predict) | float* | Device array of predictions [n_rows] |
| output (predict_probs) | float* | Device array of class probabilities [n_rows], col-major |
| output (transform) | float* | Device array of transformed features |
| stringify | std::string | Human-readable string representation of a program AST |
Usage Examples
#include <cuml/genetic/genetic.h>
#include <cuml/genetic/common.h>
raft::handle_t handle;
int n_rows = 5000;
int n_cols = 10;
float* d_X; // device [n_rows x n_cols]
float* d_y; // device [n_rows]
// Configure parameters
cuml::genetic::param params;
params.population_size = 500;
params.generations = 30;
params.metric = cuml::genetic::metric_t::mse;
params.num_features = n_cols;
params.random_state = 42;
// Fit symbolic regressor
cuml::genetic::program_t final_progs;
std::vector<std::vector<cuml::genetic::program>> history;
cuml::genetic::symFit(handle, d_X, d_y, nullptr,
n_rows, n_cols, params,
final_progs, history);
// Predict with the best program
float* d_output; // device [n_rows]
cuml::genetic::symRegPredict(handle, d_X, n_rows, final_progs, d_output);
// Visualize the best program
// (after copying from device to host)
// std::string expr = cuml::genetic::stringify(host_prog);