Principle:ClickHouse ClickHouse Fast Integer Formatting
| Knowledge Sources | |
|---|---|
| Domains | Formatting, Performance |
| Last Updated | 2026-02-08 00:00 GMT |
Overview
Replacing expensive division operations with fixed-point arithmetic and lookup tables to achieve multi-fold speedups in integer-to-string conversion.
Description
Fast integer formatting exploits the mathematical insight that dividing by powers of 10 can be approximated using fixed-point multiplication by pre-computed constants, followed by bit shifts. The jeaiii algorithm (by James Edward Anhalt III) implements this by processing two digits at a time, using multiplication by constants like `(10 × 2^24 / 1000 + 1)` to extract digit pairs without division. The algorithm also uses a pre-computed table that maps 0-99 to their two-character ASCII representations ("00", "01", ... "99"), enabling direct memory writes instead of character-by-character construction.
The key insight is that division is one of the slowest CPU operations (20-40 cycles on modern CPUs), while multiplication and shifting are fast (1-3 cycles). By carefully choosing fixed-point multipliers, we can replace division with these faster operations while maintaining exact results for decimal conversion.
Usage
Use fast integer formatting when converting many integers to strings, especially in data export, logging, JSON serialization, or display formatting. Choose this over `std::to_string` or `sprintf` for performance-critical paths. Use standard library functions when code simplicity matters more than performance or when formatting is not a bottleneck.
Theoretical Basis
Performance Comparison:
Standard library (`std::to_string`):
- Uses division and modulo per digit: ~20-40 cycles each
- For 5-digit number: ~100-200 cycles
- Includes memory allocation overhead
Jeaiii algorithm:
- Uses multiplication and shift: ~2-3 cycles per operation
- Processes two digits at once
- For 5-digit number: ~20-30 cycles
- No allocation (writes to provided buffer)
Speedup: 3-10x depending on integer size and CPU architecture
Mathematical Foundation:
To extract digits without division, we use:
`digits = (n × multiplier) >> shift`
Example for dividing by 1000:
- Exact division: `n / 1000`
- Fixed-point: `(n × 10 × 2^24 / 1000) >> 24`
- Since `2^24 / 100 ≈ 167772.16`, we use `1677722` as multiplier
- Result is within rounding error for all valid inputs
The algorithm cascades these operations to extract digit pairs at different magnitudes (units, hundreds, ten-thousands, etc.).
Lookup Table Strategy:
Pre-computed "digits" table: ``` 00, 01, 02, ..., 09, 10, 11, 12, ..., 19, ... 90, 91, 92, ..., 99 ```
Cost: 200 bytes (100 pairs × 2 chars) Benefit: Direct two-byte memory write vs two branches + two additions
Algorithm Complexity:
- Traditional: O(log₁₀(n)) divisions
- Jeaiii: O(log₁₀₀(n)) multiply-shift operations = O(log₁₀(n) / 2)
- Plus O(1) lookups for each digit pair
Optimization for Small Numbers:
The implementation adds a special case for 1-2 digit numbers:
- Branch prediction is reliable (many database values are small)
- Direct handling avoids table lookup overhead
- Critical for performance on typical database workloads
Wide Integer Handling:
For 128-bit and 256-bit integers:
- Break into 64-bit chunks
- Process high-order chunks first
- Use chunk processing by powers of 10^8 (largest power that fits in 64 bits)
- Accumulate digit pairs into result buffer
Cache Efficiency:
The digit lookup table (200 bytes) fits in L1 cache. The sequential memory writes for output have good cache behavior. The algorithm has excellent instruction-level parallelism as operations are independent.