Principle:ClickHouse ClickHouse Debug Symbol Splitting
| Knowledge Sources | |
|---|---|
| Domains | Packaging, Distribution |
| Last Updated | 2026-02-08 00:00 GMT |
Overview
Debug symbol splitting is the practice of separating debugging information from release binaries so that smaller, production-ready executables can be distributed independently from the larger debug data needed for post-mortem analysis.
Description
When a C or C++ program is compiled with debug information (e.g., using the -g flag), the resulting binary contains DWARF debug sections that map machine code back to source lines, variable names, type layouts, and other metadata. These sections can be extremely large -- often exceeding the size of the executable code itself. For a complex application like ClickHouse, the debug symbols can be several gigabytes in size.
Debug symbol splitting addresses this by extracting these DWARF sections into a separate .debug file and then stripping the original binary. The GNU toolchain provides a well-established workflow for this:
- Extract debug info: Use
objcopy --only-keep-debugto create a standalone file containing only the debug sections. - Strip the binary: Use
strip --strip-debugto remove debug sections from the original binary, producing a lean executable suitable for deployment. - Link them together: Use
objcopy --add-gnu-debuglinkto embed a reference (filename and CRC32 checksum) in the stripped binary that points back to the separate debug file.
When a debugger such as GDB loads the stripped binary, it automatically searches for the linked .debug file in standard locations (e.g., /usr/lib/debug/) and loads the symbols transparently.
This approach follows the Debian stripping policy, which mandates that shipped binaries be stripped of debug symbols while the .note and .comment sections are also removed. ClickHouse's implementation additionally preserves the .clickhouse.hash section, which is used for binary integrity verification, and retains the static symbol table to ensure meaningful stack traces even without full debug info.
Usage
Debug symbol splitting should be used whenever building ClickHouse binaries intended for distribution as Linux packages (DEB or RPM). It is a prerequisite for the packaging workflow because:
- Production deployments benefit from smaller binary sizes, reducing download times and disk usage.
- Debug packages (e.g.,
clickhouse-common-static-dbg) can be installed optionally on systems where post-mortem analysis is needed. - Crash analysis remains possible by installing the matching debug package or by manually placing the
.debugfile alongside the stripped binary.
Theoretical Basis
The theoretical foundation for debug symbol splitting rests on the separation of concerns between runtime execution and diagnostic analysis.
ELF Binary Structure
An ELF (Executable and Linkable Format) binary contains multiple sections. Some are essential for execution (e.g., .text for code, .data for initialized data, .rodata for read-only data), while others exist solely for debugging (e.g., .debug_info, .debug_line, .debug_abbrev, .debug_str). The operating system's dynamic linker ignores debug sections entirely at runtime, making them safe to remove without affecting program behavior.
The GNU Debug Link Mechanism
The .gnu_debuglink section is a lightweight pointer embedded in a stripped binary. It contains:
- The filename of the debug file (e.g.,
clickhouse.debug). - A CRC32 checksum to verify that the debug file matches the binary.
Debuggers use a well-defined search algorithm to locate the debug file:
- The directory containing the executable.
- A
.debugsubdirectory under that directory. - The global debug directory, typically
/usr/lib/debug/, appended with the executable's absolute path.
Stripping Policy
The Debian stripping policy specifies which sections to remove:
.comment: Contains compiler version strings, not needed at runtime..note: Contains auxiliary information that is not required for execution.- Debug sections: All DWARF sections (
.debug_*).
ClickHouse preserves the static symbol table (.symtab) even after stripping to maintain readable stack traces, which is important for operational monitoring and log analysis.
Size Impact
For ClickHouse, the debug symbols can be 2--4 times the size of the stripped binary. Splitting enables a deployment model where:
- The base package (
clickhouse-common-static) ships the stripped binary (typically hundreds of megabytes). - The debug package (
clickhouse-common-static-dbg) ships the debug file (potentially multiple gigabytes). - Only operators who need to debug core dumps or attach debuggers install the debug package.