Principle:ClickHouse ClickHouse Hex Binary Encoding
| Knowledge Sources | |
|---|---|
| Domains | Encoding, String_Processing |
| Last Updated | 2026-02-08 00:00 GMT |
Overview
Using pre-computed lookup tables to convert between binary and hexadecimal representations, trading memory for speed.
Description
Hex binary encoding via lookup tables is based on the principle of pre-computation. Instead of calculating hex digits through arithmetic operations (division by 16, modulo 16), we store all possible conversions in tables and perform simple memory lookups. For encoding, we maintain a table that maps each byte value (0-255) to its two-character hex representation. For decoding, we maintain a reverse table that maps hex characters to their numeric values (0-15). This approach eliminates branches and arithmetic operations from the hot path, relying instead on fast L1 cache access.
The key insight is that the domain is small and bounded: there are only 256 possible byte values and only 16 hex digits. Pre-computing all combinations fits easily in cache (512 bytes for encoding, 256 bytes for decoding) and provides deterministic, branch-free execution.
Usage
Use table-based hex encoding when performance matters and you're converting many values. This is the right choice for logging systems, checksum display, binary dumps, hex string parsing, and any scenario where hex conversion is in the hot path. Choose arithmetic-based encoding only when code size is critical or when the conversion happens infrequently.
Theoretical Basis
Performance Analysis:
Arithmetic encoding (per byte):
- Division by 16: ~10-15 cycles
- Modulo 16: ~10-15 cycles
- Convert to ASCII: 2-3 cycles
- Total: ~25-35 cycles per byte
Table lookup encoding (per byte):
- Multiply by 2: 1 cycle
- Memory load: 1-3 cycles (L1 cache)
- Memory copy: 1-2 cycles
- Total: ~3-6 cycles per byte
Speedup: 5-10x faster
Cache Considerations:
The lookup tables are small and accessed frequently:
- Encoding table: 512 bytes (256 bytes × 2 chars)
- Decoding table: 256 bytes
- Total: 768 bytes (fits entirely in L1 cache)
Modern CPUs have 32-64 KB L1 data cache, so these tables are always cache-resident for hot code paths.
Branch Elimination:
Arithmetic approach requires branches: ``` if (digit < 10)
result = '0' + digit;
else
result = 'A' + (digit - 10);
```
Table lookup is branch-free, improving pipeline efficiency and reducing branch misprediction penalties.
Memory vs Speed Tradeoff:
- Cost: 768 bytes of constant data
- Benefit: 5-10x speedup in critical path
- Conclusion: Excellent tradeoff for high-throughput applications
Constexpr Benefits:
Modern C++ allows table generation at compile time, so the tables have zero runtime initialization cost. The compiler can even optimize table lookups into direct constant loads in some cases.
Practical Considerations:
This approach excels when:
- Converting many values (amortizes table memory cost)
- Tables fit in cache (always true for hex encoding)
- Code is CPU-bound (not I/O-bound)
It's less beneficial when:
- Converting single values infrequently
- Memory is extremely constrained (embedded systems)
- The conversion is not in a hot path