Principle:ClickHouse ClickHouse PCG Random Number Generation
| Knowledge Sources | |
|---|---|
| Domains | Random_Number_Generation |
| Last Updated | 2026-02-08 00:00 GMT |
Overview
Generate high-quality random numbers using permuted congruential generators with excellent statistical properties, speed, and flexibility.
Description
This principle applies permutation functions to linear congruential generator (LCG) output to create random number generators that are fast, statistically strong, and highly configurable. The PCG family combines the simplicity and speed of LCGs with permutation-based output functions that scramble the internal state to produce excellent randomness. The approach supports multiple stream types (single stream, MCG, settable streams, unique per-instance streams), various output functions optimized for different bit sizes, and extended generators with astronomically long periods. The generators maintain full state and support efficient jumping forward or backward in the sequence.
Usage
Use PCG generators when you need reproducible random sequences, multiple independent random streams (for parallel simulations), very long periods (for Monte Carlo methods), or fast generation with good statistical properties. PCG is particularly valuable when the application needs to save/restore generator state or advance to arbitrary positions in the sequence.
Theoretical Basis
Linear congruential generators use the recurrence: `state = (a * state + c) mod m` where a is the multiplier, c is the increment, and m is the modulus. LCGs have several theoretical properties:
- Full period: With proper choice of a and c, period equals m
- Predictability: Linear structure makes raw LCG output unsuitable for cryptography
- Statistical flaws: Lower bits have shorter periods, sequential values show lattice structure in high dimensions
PCG addresses LCG weaknesses through permutation functions applied to the output:
- XSH RR (xorshift high, random rotate): XOR high bits down, then rotate by bits from the state
- XSL RR (xorshift low, random rotate): XOR high bits into low bits, then rotate
- RXS M XS (random xorshift, multiply, xorshift): Multiple mixing steps for maximum strength
The permutation functions are chosen to: 1. Be efficiently computable (single-cycle on modern CPUs) 2. Maximize statistical quality (pass rigorous test suites like TestU01) 3. Be invertible (allowing "inside-out" generation for extended generators)
Stream control provides independent sequences:
- MCG (multiplicative congruential): Set c=0, reducing state by 2 bits but simplifying the recurrence
- Single stream: Fixed c value, all instances share same sequence
- Settable stream: User specifies c (encoded as `(stream << 1) | 1` to ensure odd)
- Unique stream: c derived from object address, each instance has unique sequence
Extended generators maintain a table of output values and advance the entire table periodically, dramatically increasing the period. For example, `pcg32_k1024` has 1024 32-bit values plus base state, giving period `2^32800` (larger than Mersenne Twister's `2^19937`).
Efficient sequence advancement uses the identity: after n steps, `state = (a^n * state + ((a^n - 1)/(a - 1)) * c) mod m`. Computing `a^n mod m` uses binary exponentiation in O(log n) time, allowing jumps of trillions of steps nearly instantaneously.
The generators are parameterized by:
- State type (8, 16, 32, 64, 128 bits)
- Output type (same or smaller than state)
- Output function (XSH RR, XSL RR, RXS M XS, etc.)
- Stream type (oneseq, setseq, unique, mcg)
This creates a family of generators optimized for different use cases while sharing the same mathematical foundation.