Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Principle:Duckdb Duckdb Random Number Generation

From Leeroopedia


Knowledge Sources
Domains Random_Number_Generation, Algorithm
Last Updated 2026-02-07 12:00 GMT

Overview

A family of pseudorandom number generators that combine a linear congruential generator state transition with a permutation-based output function to produce statistically excellent random numbers with minimal state.

Description

PCG (Permuted Congruential Generator) is a family of pseudorandom number generators designed by Melissa O'Neill in 2014. PCG generators address the shortcomings of traditional linear congruential generators (LCGs) by applying a permutation function to the output of an LCG, breaking the statistical regularities that make raw LCG output predictable.

The core idea is to separate the state transition function (which determines the sequence of internal states) from the output function (which converts internal state to random output). The state transition uses a standard LCG with a power-of-2 modulus, which provides a maximal-period sequence and simple implementation. The output function applies bit rotations, shifts, and XOR operations that depend on the high bits of the state, creating a complex, statistically robust relationship between internal state and output.

PCG generators offer several advantages over alternatives: they pass stringent statistical test suites (TestU01, PractRand), use minimal state (64 or 128 bits), are extremely fast (often a single multiply and a few bit operations), support multiple independent streams via different increment values, and have provably long periods (2^64 or 2^128). The design also supports efficient jumping (advancing the state by an arbitrary number of steps in O(log n) time).

Usage

PCG is used in DuckDB as the pseudorandom number generator for SQL functions like `random()`, `setseed()`, and for internal operations that require randomization such as reservoir sampling, hash-based operations, and probabilistic algorithms. The small state and fast generation make it suitable for per-thread random number generation in DuckDB's parallel execution engine.

Theoretical Basis

LCG State Transition: The internal state advances with a standard LCG:

// 64-bit state LCG
state = state * 6364136223846793005 + increment
// increment is odd, determining the stream
// Period = 2^64 for any odd increment

PCG-XSH-RR Output Function: Permutation applied to state:

// PCG-XSH-RR (32-bit output from 64-bit state)
function pcg32(state):
    // XSH: XOR high bits into mid bits
    xorshifted = ((state >> 18) XOR state) >> 27  // 32 bits
    // RR: Random rotation by top 5 bits of state
    rot = state >> 59                              // 5 bits
    return (xorshifted >> rot) | (xorshifted << (32 - rot))

Stream Selection: Independent streams via different increments:

// Each odd increment produces a different permutation of 2^64 values
// Two generators with same multiplier but different increments
// produce statistically independent sequences
stream_1: increment = 1442695040888963407  // odd
stream_2: increment = 1442695040888963409  // different odd value
// Both have period 2^64, but produce different sequences

Advance (Jump) Operation: Skip ahead in O(log n):

// Advance state by delta steps without iterating
// Uses the property: state_n = M^n * state_0 + (M^n - 1)/(M - 1) * inc
function advance(state, delta, multiplier, increment):
    acc_mult = 1
    acc_plus = 0
    cur_mult = multiplier
    cur_plus = increment
    while delta > 0:
        if delta & 1:
            acc_mult *= cur_mult
            acc_plus = acc_plus * cur_mult + cur_plus
        cur_plus = (cur_mult + 1) * cur_plus
        cur_mult *= cur_mult
        delta >>= 1
    return acc_mult * state + acc_plus

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment