Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Principle:Rapidsai Cuml Genetic Programming

From Leeroopedia
Revision as of 17:58, 16 February 2026 by Admin (talk | contribs) (Auto-imported from principles/Rapidsai_Cuml_Genetic_Programming.md)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)


Knowledge Sources
Domains Machine_Learning, Evolutionary_Computation, Symbolic_Regression
Last Updated 2026-02-08 12:00 GMT

Overview

Genetic programming evolves populations of mathematical expressions represented as abstract syntax trees, using selection, crossover, and mutation operators inspired by biological evolution to discover symbolic models that fit observed data.

Description

Genetic programming (GP) is an evolutionary computation technique that searches the space of computer programs (or mathematical expressions) to find one that optimally solves a given task. In the context of machine learning, GP is most commonly applied to symbolic regression: discovering a closed-form mathematical formula that describes the relationship between input features and a target variable.

Representation: Each candidate solution (called a program) is represented as a tree of nodes. Terminal nodes are either input variables (feature references) or ephemeral constants. Non-terminal (function) nodes represent mathematical operations, which can be unary (abs, sin, cos, exp, log, sqrt, square, cube, etc.) or binary (add, subtract, multiply, divide, power, max, min, etc.). The tree is stored as a flat array and evaluated using a stack-based interpreter on GPU, where each thread processes one data row.

Evolution: The algorithm maintains a population of programs that evolves over generations through the following operators:

  • Tournament Selection: A subset of programs is randomly sampled, and the one with the best fitness (adjusted by a parsimony coefficient that penalizes program length) is selected as a parent. The parsimony coefficient controls the trade-off between accuracy and model complexity (bloat control).
  • Crossover: Two parent programs exchange random subtrees to produce offspring that combine structural elements from both parents.
  • Subtree Mutation: A random subtree is replaced with a newly generated random subtree.
  • Hoist Mutation: A random subtree is replaced by one of its own subtrees, reducing program depth.
  • Point Mutation: Individual nodes are randomly replaced with compatible alternatives (same arity).
  • Reproduction: A program is copied unchanged to the next generation.

Initialization: The initial population is created using one of three methods: grow (random depth trees), full (maximum depth trees), or half-and-half (equal mix of both).

Fitness Evaluation: Programs are evaluated against training data using metrics such as mean absolute error (MAE), mean squared error (MSE), root mean squared error (RMSE), Pearson correlation, Spearman rank correlation, or log-loss for classification tasks.

Usage

Genetic programming is the right choice when:

  • An interpretable, closed-form mathematical expression is desired rather than a black-box model.
  • The functional form of the relationship is unknown and needs to be discovered from data (symbolic regression).
  • Feature engineering is needed: GP can discover non-linear transformations of input features.
  • The search space of possible mathematical relationships is large and non-convex, making gradient-based methods impractical.
  • Classification tasks require simple decision boundaries expressible as mathematical formulas (using a sigmoid transformer).

Theoretical Basis

Program Evaluation (Stack Machine):

For each data row i in parallel:
    Initialize empty evaluation stack
    For each node n in program (reverse order):
        If n is terminal:
            push value(n, row_i) onto stack
        Else:
            pop arity(n) values from stack
            push function(n)(values) onto stack
    prediction[i] = stack.pop()

Tournament Selection with Parsimony:

score(p)=f(p)clen(p)(2k1)

where f(p) is the raw fitness of program p, c is the parsimony coefficient, len(p) is the program length, and k{0,1} determines whether lower or higher fitness is better.

Crossover Operation:

Given parent programs P1, P2:
    1. Select random subtree S1 from P1
    2. Select random subtree S2 from P2
    3. Replace S1 in P1 with S2
    4. Return modified P1 as offspring

Reproduction Probability:

preproduce=1.0pcrossoverpsubtreephoistppoint

Related Pages

Implemented By

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment