Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Rapidsai Cuml Symreg Example

From Leeroopedia


Knowledge Sources
Domains Machine_Learning, Genetic_Programming
Last Updated 2026-02-08 12:00 GMT

Overview

A standalone C++ example demonstrating GPU-accelerated symbolic regression using the cuML genetic programming API.

Description

symreg_example.cpp provides a complete working example of the cuML symbolic regression pipeline. It reads training and test data from text files, configures genetic programming hyperparameters, trains a symbolic regression model on the GPU, and evaluates the resulting mathematical expression on a test dataset.

The program follows this pipeline:

  1. Argument Parsing -- Command-line arguments configure population size, random state, number of generations, stopping criteria, mutation probabilities (crossover, subtree, hoist, point), parsimony coefficient, evaluation metric (MAE, MSE, or RMSE), dataset dimensions, and file paths for training/test data, labels, and optional sample weights.
  2. Data Loading -- Reads column-major formatted text files into host vectors using parse_col_major. Missing weight files default to uniform weights.
  3. GPU Memory Allocation -- Allocates device memory using RMM (rmm::device_uvector) for features, labels, weights, predictions, and program storage. Data is copied from host to device using async memcpy.
  4. Training -- Calls cuml::genetic::symFit to evolve a population of programs over multiple generations. The training history is stored as a vector of program populations.
  5. Best Program Selection -- Finds the program with the lowest raw fitness in the final generation and converts it to a human-readable equation using cuml::genetic::stringify.
  6. Inference -- Calls cuml::genetic::symRegPredict to generate predictions on test data using the best program, and evaluates the fitness score using cuml::genetic::compute_metric.

The example uses CUDA events for timing each phase (allocation, training, inference) and outputs performance metrics.

Usage

Use this example as a reference for integrating cuML symbolic regression into C++ applications, or to test and benchmark the genetic programming implementation with custom datasets.

Code Reference

Source Location

  • Repository: Rapidsai_Cuml
  • File: cpp/examples/symreg/symreg_example.cpp

Signature

int main(int argc, char* argv[]);

template <typename T>
T get_argval(char** begin, char** end, const std::string& arg, const T default_val);

template <typename math_t = float>
int parse_col_major(const std::string fname, std::vector<math_t>& vec,
                    const int n_rows, const int n_cols);

Import

#include <cuml/genetic/common.h>
#include <cuml/genetic/genetic.h>
#include <cuml/genetic/program.h>
#include <cuml/common/logger.hpp>
#include <raft/util/cudart_utils.hpp>
#include <rmm/device_uvector.hpp>
#include <rmm/device_scalar.hpp>

I/O Contract

Inputs

Name Type Required Description
-n_cols int Yes Number of feature columns
-n_train_rows int Yes Number of training samples
-n_test_rows int Yes Number of test samples
-train_data string No Path to training feature file (default: train_data.txt)
-train_labels string No Path to training labels file (default: train_labels.txt)
-test_data string No Path to test feature file (default: test_data.txt)
-test_labels string No Path to test labels file (default: test_labels.txt)
-population_size int No Size of the genetic program population
-generations int No Number of evolution generations
-metric string No Evaluation metric: mae, mse, or rmse (default: mae)
-p_crossover float No Crossover probability
-p_subtree float No Subtree mutation probability
-p_hoist float No Hoist mutation probability
-p_point float No Point mutation probability

Outputs

Name Type Description
Best equation string Human-readable mathematical expression discovered by the genetic program
Training time float GPU training time in milliseconds
Inference score float Fitness score on the test dataset
Predicted values float[] Predicted target values for the test set

Usage Examples

# Run the symbolic regression example
./symreg_example \
  -n_cols 5 \
  -n_train_rows 1000 \
  -n_test_rows 200 \
  -train_data my_train.txt \
  -train_labels my_labels.txt \
  -test_data my_test.txt \
  -test_labels my_test_labels.txt \
  -population_size 5000 \
  -generations 20 \
  -metric mse

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment