Principle:ClickHouse ClickHouse Fast Integer Formatting

Knowledge Sources	ClickHouse
Domains	Formatting, Performance
Last Updated	2026-02-08 00:00 GMT

Overview

Replacing expensive division operations with fixed-point arithmetic and lookup tables to achieve multi-fold speedups in integer-to-string conversion.

Description

Fast integer formatting exploits the mathematical insight that dividing by powers of 10 can be approximated using fixed-point multiplication by pre-computed constants, followed by bit shifts. The jeaiii algorithm (by James Edward Anhalt III) implements this by processing two digits at a time, using multiplication by constants like `(10 × 2^24 / 1000 + 1)` to extract digit pairs without division. The algorithm also uses a pre-computed table that maps 0-99 to their two-character ASCII representations ("00", "01", ... "99"), enabling direct memory writes instead of character-by-character construction.

The key insight is that division is one of the slowest CPU operations (20-40 cycles on modern CPUs), while multiplication and shifting are fast (1-3 cycles). By carefully choosing fixed-point multipliers, we can replace division with these faster operations while maintaining exact results for decimal conversion.

Usage

Use fast integer formatting when converting many integers to strings, especially in data export, logging, JSON serialization, or display formatting. Choose this over `std::to_string` or `sprintf` for performance-critical paths. Use standard library functions when code simplicity matters more than performance or when formatting is not a bottleneck.

Theoretical Basis

Performance Comparison:

Standard library (`std::to_string`):

Uses division and modulo per digit: ~20-40 cycles each
For 5-digit number: ~100-200 cycles
Includes memory allocation overhead

Jeaiii algorithm:

Uses multiplication and shift: ~2-3 cycles per operation
Processes two digits at once
For 5-digit number: ~20-30 cycles
No allocation (writes to provided buffer)

Speedup: 3-10x depending on integer size and CPU architecture

Mathematical Foundation:

To extract digits without division, we use:

`digits = (n × multiplier) >> shift`

Example for dividing by 1000:

Exact division: `n / 1000`
Fixed-point: `(n × 10 × 2^24 / 1000) >> 24`
Since `2^24 / 100 ≈ 167772.16`, we use `1677722` as multiplier
Result is within rounding error for all valid inputs

The algorithm cascades these operations to extract digit pairs at different magnitudes (units, hundreds, ten-thousands, etc.).

Lookup Table Strategy:

Pre-computed "digits" table: ``` 00, 01, 02, ..., 09, 10, 11, 12, ..., 19, ... 90, 91, 92, ..., 99 ```

Cost: 200 bytes (100 pairs × 2 chars) Benefit: Direct two-byte memory write vs two branches + two additions

Algorithm Complexity:

Traditional: O(log₁₀(n)) divisions
Jeaiii: O(log₁₀₀(n)) multiply-shift operations = O(log₁₀(n) / 2)
Plus O(1) lookups for each digit pair

Optimization for Small Numbers:

The implementation adds a special case for 1-2 digit numbers:

Branch prediction is reliable (many database values are small)
Direct handling avoids table lookup overhead
Critical for performance on typical database workloads

Wide Integer Handling:

For 128-bit and 256-bit integers:

Break into 64-bit chunks
Process high-order chunks first
Use chunk processing by powers of 10^8 (largest power that fits in 64 bits)
Accumulate digit pairs into result buffer

Cache Efficiency:

The digit lookup table (200 bytes) fits in L1 cache. The sequential memory writes for output have good cache behavior. The algorithm has excellent instruction-level parallelism as operations are independent.

Related Pages

Implementation:ClickHouse_ClickHouse_Itoa

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment