Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Principle:ClickHouse ClickHouse Binary Linking

From Leeroopedia


Knowledge Sources
Domains Build_System, C++, Systems_Programming
Last Updated 2026-02-08 00:00 GMT

Overview

Monolithic binary linking is the final build stage that combines all component libraries, interposes a custom memory allocator, and produces a single executable with a busybox-style dispatch table for selecting among multiple application modes.

Description

The final linking stage of ClickHouse produces a single monolithic binary by combining:

  • Program component libraries: clickhouse-server-lib, clickhouse-client-lib, clickhouse-local-lib, and approximately 20 other per-component static libraries.
  • The database engine: The dbms static library containing query processing, storage engines, and all core database functionality.
  • Common libraries: clickhouse_common_io, common, and other base-layer libraries.
  • Third-party libraries: Approximately 90 vendored static libraries from contrib/.

A critical aspect of the linking process is memory allocator interposition. ClickHouse uses jemalloc as its memory allocator, but rather than simply linking against it, the build system uses the linker's --wrap mechanism to intercept all standard C allocation functions. This enables ClickHouse's memory tracking system to account for every allocation and deallocation.

The clickhouse_add_executable macro (defined in the root CMakeLists.txt) wraps the standard CMake add_executable to inject:

  1. clickhouse_malloc object files: These provide the __wrap_malloc, __wrap_free, and related functions that intercept allocations.
  2. Linker --wrap options: On Linux (non-sanitizer builds), the linker wraps malloc, free, calloc, realloc, aligned_alloc, posix_memalign, valloc, memalign, reallocarray, and pvalloc.
  3. memcpy object files: On amd64 with glibc compatibility and ThinLTO enabled, a custom memcpy is injected to neutralize ThinLTO's libcall generation.
  4. clickhouse_new_delete library: Provides custom C++ operator new and operator delete that use the wrapped allocator.

The --wrap mechanism works at the linker level: calls to malloc are redirected to __wrap_malloc, and the original malloc is made available as __real_malloc. This is more reliable than LD_PRELOAD or weak symbol overriding because it happens at link time, not at runtime.

On macOS and FreeBSD, the --wrap linker option is not available, so the memory tracking uses a different mechanism. Sanitizer builds also skip wrapping because sanitizers have their own allocation interception.

Usage

Use monolithic binary linking with memory allocator interposition when:

  • The application needs precise memory accounting (tracking every allocation for memory limits, profiling, and leak detection).
  • A custom allocator (like jemalloc) provides better performance than the system allocator, but the application also needs a tracking layer on top.
  • The busybox architecture requires combining many component libraries into a single executable.
  • The project needs to ensure that no allocation escapes tracking, including those from third-party libraries.

Theoretical Basis

The linker --wrap mechanism operates as follows:

BEFORE --wrap:
    code calls:  malloc(size)  -->  glibc malloc

AFTER --wrap=malloc:
    code calls:  malloc(size)  -->  __wrap_malloc(size)
                                    {
                                        track_allocation(size);
                                        return __real_malloc(size);  // original malloc
                                    }

The complete wrapping covers all standard C allocation functions:

Wrapped functions:
    malloc, free, calloc, realloc, aligned_alloc,
    posix_memalign, valloc, memalign, reallocarray, pvalloc (non-musl only)

The dispatch table in main.cpp maps program names to entry functions:

clickhouse_applications[] = {
    {"local",    mainEntryClickHouseLocal},
    {"client",   mainEntryClickHouseClient},
    {"server",   mainEntryClickHouseServer},
    {"keeper",   mainEntryClickHouseKeeper},       // conditional
    {"benchmark", mainEntryClickHouseBenchmark},
    ... 20+ more entries ...
    {"help",     mainEntryHelp},
}

Runtime dispatch logic in main:

  1. Check argv[0] for symlink names (e.g., clickhouse-server).
  2. Check argv[1] for subcommand names (e.g., clickhouse server).
  3. Check short aliases (e.g., chl for local, chc for client).
  4. Check for --host/--port arguments to auto-detect client mode.
  5. Default to local mode if no match or if run with flags like -q.

Related Pages

Implemented By

Uses Heuristic

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment