Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Principle:ClickHouse ClickHouse Program Component Compilation

From Leeroopedia


Knowledge Sources
Domains Build_System, C++, Software_Architecture
Last Updated 2026-02-08 00:00 GMT

Overview

A modular busybox-style program architecture compiles each application mode (server, client, local, keeper, and others) as an independent static library, then links them all into a single monolithic binary with a dispatch table that selects the appropriate entry point at runtime.

Description

ClickHouse ships as a single binary that contains over 20 distinct application modes. Rather than distributing separate executables for the server, client, local query engine, keeper, benchmarking tool, and various utilities, all modes are compiled into one binary. At runtime, the binary examines the program name (via argv[0]) and command-line arguments to determine which mode to execute.

This approach, inspired by BusyBox, has several advantages:

  • Distribution simplicity: A single file to deploy, copy, or update. Symlinks or hardlinks (e.g., clickhouse-server -> clickhouse) provide familiar command names.
  • Shared code deduplication: The server, client, and local modes share large amounts of code (query parsing, storage engines, protocol handling). A monolithic binary avoids duplicating this code across separate executables.
  • Consistent versioning: All tools within the binary are always at the same version, eliminating version mismatch issues.

The compilation follows a two-step process:

  1. Library compilation: Each program component defines its own source files and dependencies in a subdirectory under programs/. The clickhouse_program_add macro compiles these sources into a static library named clickhouse-{name}-lib.
  2. Binary linking: All per-component libraries are linked together with the core database engine (dbms) and common I/O libraries into a single executable.

Each program component follows a naming convention:

  • Source files: programs/{name}/
  • Library target: clickhouse-{name}-lib
  • Variables: CLICKHOUSE_{NAME_UC}_SOURCES, CLICKHOUSE_{NAME_UC}_LINK, CLICKHOUSE_{NAME_UC}_INCLUDE

Some components are conditionally compiled based on CMake options. For example, the keeper component requires the NuRaft library and is gated by ENABLE_CLICKHOUSE_KEEPER.

Usage

Use the busybox-style architecture when:

  • A project produces multiple closely related command-line tools that share a large common codebase.
  • Deployment simplicity is valued (single binary, single update path).
  • The combined binary size is acceptable (shared code deduplication often makes the monolithic binary smaller than the sum of separate executables).
  • Version consistency between tools is critical (database server and client must speak the same protocol version).

Theoretical Basis

The busybox dispatch pattern works as follows:

1. COMPILATION PHASE:
   For each program component P in {server, client, local, keeper, ...}:
       sources = CLICKHOUSE_{P_UPPERCASE}_SOURCES
       library = clickhouse-{P}-lib
       Compile sources into static library

2. LINKING PHASE:
   Link all component libraries + dbms + common into single binary "clickhouse"

3. INSTALLATION PHASE:
   Create symlinks: clickhouse-server -> clickhouse
                    clickhouse-client -> clickhouse
                    clickhouse-local  -> clickhouse
                    clickhouse-keeper -> clickhouse
                    ...

4. RUNTIME DISPATCH:
   main(argc, argv):
       app_name = basename(argv[0])  # e.g., "clickhouse-server"
       for each (name, entry_func) in dispatch_table:
           if app_name matches name:
               return entry_func(argc, argv)
       # Also check argv[1]: "clickhouse server" dispatches to server mode
       # Special case: bare "clickhouse" defaults to local mode

The macro system uses CMake variable scoping to propagate source lists and link dependencies from subdirectories to the parent scope. Each subdirectory (e.g., programs/server/) sets CLICKHOUSE_SERVER_SOURCES and CLICKHOUSE_SERVER_LINK variables, which are then read by the parent programs/CMakeLists.txt when creating the library.

The uppercase transformation (string(TOUPPER)) and hyphen-to-underscore replacement (string(REPLACE "-" "_")) ensure that component names like keeper-client map to valid CMake variable names like CLICKHOUSE_KEEPER_CLIENT_SOURCES.

Related Pages

Implemented By

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment