Implementation:Rapidsai Cuml Symreg Example
| Knowledge Sources | |
|---|---|
| Domains | Machine_Learning, Genetic_Programming |
| Last Updated | 2026-02-08 12:00 GMT |
Overview
A standalone C++ example demonstrating GPU-accelerated symbolic regression using the cuML genetic programming API.
Description
symreg_example.cpp provides a complete working example of the cuML symbolic regression pipeline. It reads training and test data from text files, configures genetic programming hyperparameters, trains a symbolic regression model on the GPU, and evaluates the resulting mathematical expression on a test dataset.
The program follows this pipeline:
- Argument Parsing -- Command-line arguments configure population size, random state, number of generations, stopping criteria, mutation probabilities (crossover, subtree, hoist, point), parsimony coefficient, evaluation metric (MAE, MSE, or RMSE), dataset dimensions, and file paths for training/test data, labels, and optional sample weights.
- Data Loading -- Reads column-major formatted text files into host vectors using
parse_col_major. Missing weight files default to uniform weights. - GPU Memory Allocation -- Allocates device memory using RMM (
rmm::device_uvector) for features, labels, weights, predictions, and program storage. Data is copied from host to device using async memcpy. - Training -- Calls
cuml::genetic::symFitto evolve a population of programs over multiple generations. The training history is stored as a vector of program populations. - Best Program Selection -- Finds the program with the lowest raw fitness in the final generation and converts it to a human-readable equation using
cuml::genetic::stringify. - Inference -- Calls
cuml::genetic::symRegPredictto generate predictions on test data using the best program, and evaluates the fitness score usingcuml::genetic::compute_metric.
The example uses CUDA events for timing each phase (allocation, training, inference) and outputs performance metrics.
Usage
Use this example as a reference for integrating cuML symbolic regression into C++ applications, or to test and benchmark the genetic programming implementation with custom datasets.
Code Reference
Source Location
- Repository: Rapidsai_Cuml
- File:
cpp/examples/symreg/symreg_example.cpp
Signature
int main(int argc, char* argv[]);
template <typename T>
T get_argval(char** begin, char** end, const std::string& arg, const T default_val);
template <typename math_t = float>
int parse_col_major(const std::string fname, std::vector<math_t>& vec,
const int n_rows, const int n_cols);
Import
#include <cuml/genetic/common.h>
#include <cuml/genetic/genetic.h>
#include <cuml/genetic/program.h>
#include <cuml/common/logger.hpp>
#include <raft/util/cudart_utils.hpp>
#include <rmm/device_uvector.hpp>
#include <rmm/device_scalar.hpp>
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| -n_cols | int | Yes | Number of feature columns |
| -n_train_rows | int | Yes | Number of training samples |
| -n_test_rows | int | Yes | Number of test samples |
| -train_data | string | No | Path to training feature file (default: train_data.txt) |
| -train_labels | string | No | Path to training labels file (default: train_labels.txt) |
| -test_data | string | No | Path to test feature file (default: test_data.txt) |
| -test_labels | string | No | Path to test labels file (default: test_labels.txt) |
| -population_size | int | No | Size of the genetic program population |
| -generations | int | No | Number of evolution generations |
| -metric | string | No | Evaluation metric: mae, mse, or rmse (default: mae) |
| -p_crossover | float | No | Crossover probability |
| -p_subtree | float | No | Subtree mutation probability |
| -p_hoist | float | No | Hoist mutation probability |
| -p_point | float | No | Point mutation probability |
Outputs
| Name | Type | Description |
|---|---|---|
| Best equation | string | Human-readable mathematical expression discovered by the genetic program |
| Training time | float | GPU training time in milliseconds |
| Inference score | float | Fitness score on the test dataset |
| Predicted values | float[] | Predicted target values for the test set |
Usage Examples
# Run the symbolic regression example
./symreg_example \
-n_cols 5 \
-n_train_rows 1000 \
-n_test_rows 200 \
-train_data my_train.txt \
-train_labels my_labels.txt \
-test_data my_test.txt \
-test_labels my_test_labels.txt \
-population_size 5000 \
-generations 20 \
-metric mse