Principle:ClickHouse ClickHouse Binary Linking
| Knowledge Sources | |
|---|---|
| Domains | Build_System, C++, Systems_Programming |
| Last Updated | 2026-02-08 00:00 GMT |
Overview
Monolithic binary linking is the final build stage that combines all component libraries, interposes a custom memory allocator, and produces a single executable with a busybox-style dispatch table for selecting among multiple application modes.
Description
The final linking stage of ClickHouse produces a single monolithic binary by combining:
- Program component libraries:
clickhouse-server-lib,clickhouse-client-lib,clickhouse-local-lib, and approximately 20 other per-component static libraries. - The database engine: The
dbmsstatic library containing query processing, storage engines, and all core database functionality. - Common libraries:
clickhouse_common_io,common, and other base-layer libraries. - Third-party libraries: Approximately 90 vendored static libraries from
contrib/.
A critical aspect of the linking process is memory allocator interposition. ClickHouse uses jemalloc as its memory allocator, but rather than simply linking against it, the build system uses the linker's --wrap mechanism to intercept all standard C allocation functions. This enables ClickHouse's memory tracking system to account for every allocation and deallocation.
The clickhouse_add_executable macro (defined in the root CMakeLists.txt) wraps the standard CMake add_executable to inject:
clickhouse_mallocobject files: These provide the__wrap_malloc,__wrap_free, and related functions that intercept allocations.- Linker
--wrapoptions: On Linux (non-sanitizer builds), the linker wrapsmalloc,free,calloc,realloc,aligned_alloc,posix_memalign,valloc,memalign,reallocarray, andpvalloc. memcpyobject files: On amd64 with glibc compatibility and ThinLTO enabled, a custom memcpy is injected to neutralize ThinLTO's libcall generation.clickhouse_new_deletelibrary: Provides custom C++operator newandoperator deletethat use the wrapped allocator.
The --wrap mechanism works at the linker level: calls to malloc are redirected to __wrap_malloc, and the original malloc is made available as __real_malloc. This is more reliable than LD_PRELOAD or weak symbol overriding because it happens at link time, not at runtime.
On macOS and FreeBSD, the --wrap linker option is not available, so the memory tracking uses a different mechanism. Sanitizer builds also skip wrapping because sanitizers have their own allocation interception.
Usage
Use monolithic binary linking with memory allocator interposition when:
- The application needs precise memory accounting (tracking every allocation for memory limits, profiling, and leak detection).
- A custom allocator (like jemalloc) provides better performance than the system allocator, but the application also needs a tracking layer on top.
- The busybox architecture requires combining many component libraries into a single executable.
- The project needs to ensure that no allocation escapes tracking, including those from third-party libraries.
Theoretical Basis
The linker --wrap mechanism operates as follows:
BEFORE --wrap:
code calls: malloc(size) --> glibc malloc
AFTER --wrap=malloc:
code calls: malloc(size) --> __wrap_malloc(size)
{
track_allocation(size);
return __real_malloc(size); // original malloc
}
The complete wrapping covers all standard C allocation functions:
Wrapped functions:
malloc, free, calloc, realloc, aligned_alloc,
posix_memalign, valloc, memalign, reallocarray, pvalloc (non-musl only)
The dispatch table in main.cpp maps program names to entry functions:
clickhouse_applications[] = {
{"local", mainEntryClickHouseLocal},
{"client", mainEntryClickHouseClient},
{"server", mainEntryClickHouseServer},
{"keeper", mainEntryClickHouseKeeper}, // conditional
{"benchmark", mainEntryClickHouseBenchmark},
... 20+ more entries ...
{"help", mainEntryHelp},
}
Runtime dispatch logic in main:
- Check
argv[0]for symlink names (e.g.,clickhouse-server). - Check
argv[1]for subcommand names (e.g.,clickhouse server). - Check short aliases (e.g.,
chlforlocal,chcforclient). - Check for
--host/--portarguments to auto-detect client mode. - Default to
localmode if no match or if run with flags like-q.