Principle:Duckdb Duckdb Random Number Generation
| Knowledge Sources | |
|---|---|
| Domains | Random_Number_Generation, Algorithm |
| Last Updated | 2026-02-07 12:00 GMT |
Overview
A family of pseudorandom number generators that combine a linear congruential generator state transition with a permutation-based output function to produce statistically excellent random numbers with minimal state.
Description
PCG (Permuted Congruential Generator) is a family of pseudorandom number generators designed by Melissa O'Neill in 2014. PCG generators address the shortcomings of traditional linear congruential generators (LCGs) by applying a permutation function to the output of an LCG, breaking the statistical regularities that make raw LCG output predictable.
The core idea is to separate the state transition function (which determines the sequence of internal states) from the output function (which converts internal state to random output). The state transition uses a standard LCG with a power-of-2 modulus, which provides a maximal-period sequence and simple implementation. The output function applies bit rotations, shifts, and XOR operations that depend on the high bits of the state, creating a complex, statistically robust relationship between internal state and output.
PCG generators offer several advantages over alternatives: they pass stringent statistical test suites (TestU01, PractRand), use minimal state (64 or 128 bits), are extremely fast (often a single multiply and a few bit operations), support multiple independent streams via different increment values, and have provably long periods (2^64 or 2^128). The design also supports efficient jumping (advancing the state by an arbitrary number of steps in O(log n) time).
Usage
PCG is used in DuckDB as the pseudorandom number generator for SQL functions like `random()`, `setseed()`, and for internal operations that require randomization such as reservoir sampling, hash-based operations, and probabilistic algorithms. The small state and fast generation make it suitable for per-thread random number generation in DuckDB's parallel execution engine.
Theoretical Basis
LCG State Transition: The internal state advances with a standard LCG:
// 64-bit state LCG
state = state * 6364136223846793005 + increment
// increment is odd, determining the stream
// Period = 2^64 for any odd increment
PCG-XSH-RR Output Function: Permutation applied to state:
// PCG-XSH-RR (32-bit output from 64-bit state)
function pcg32(state):
// XSH: XOR high bits into mid bits
xorshifted = ((state >> 18) XOR state) >> 27 // 32 bits
// RR: Random rotation by top 5 bits of state
rot = state >> 59 // 5 bits
return (xorshifted >> rot) | (xorshifted << (32 - rot))
Stream Selection: Independent streams via different increments:
// Each odd increment produces a different permutation of 2^64 values
// Two generators with same multiplier but different increments
// produce statistically independent sequences
stream_1: increment = 1442695040888963407 // odd
stream_2: increment = 1442695040888963409 // different odd value
// Both have period 2^64, but produce different sequences
Advance (Jump) Operation: Skip ahead in O(log n):
// Advance state by delta steps without iterating
// Uses the property: state_n = M^n * state_0 + (M^n - 1)/(M - 1) * inc
function advance(state, delta, multiplier, increment):
acc_mult = 1
acc_plus = 0
cur_mult = multiplier
cur_plus = increment
while delta > 0:
if delta & 1:
acc_mult *= cur_mult
acc_plus = acc_plus * cur_mult + cur_plus
cur_plus = (cur_mult + 1) * cur_plus
cur_mult *= cur_mult
delta >>= 1
return acc_mult * state + acc_plus