Principle:ClickHouse ClickHouse Banned Function Enforcement
| Knowledge Sources | |
|---|---|
| Domains | Development_Process, Code_Quality |
| Last Updated | 2026-02-08 00:00 GMT |
Overview
Runtime detection of unsafe libc function usage in debug and sanitizer builds enforces coding standards by immediately terminating the program when a banned C library function is called.
Description
ClickHouse maintains a strict policy against using certain C library functions that are non-thread-safe, insecure, or otherwise harmful to a high-performance, multi-threaded database engine. Rather than relying solely on code review or static analysis to catch violations, the project employs a link-time redeclaration strategy that replaces approximately 200 libc function symbols with trap implementations. When any of these banned functions is called during execution, the program immediately aborts after printing the offending function name to standard error.
This approach is grounded in several key observations:
- Thread safety: Many traditional C library functions (e.g., `gmtime`, `localtime`, `strtok`, `rand`, `asctime`, `ctime`) use internal static buffers, making them inherently non-thread-safe. In a multi-threaded database server, calling these functions risks data races and silent data corruption.
- Security: Functions like `system`, `getpass`, `tmpnam`, and `encrypt` pose security risks ranging from command injection to predictable temporary file names.
- Correctness: Functions like `sleep` can mask race conditions rather than properly solving them, while functions such as `setenv`/`unsetenv` are not thread-safe and can corrupt the process environment.
- Sanitizer compatibility: C11 threading primitives (`thrd_create`, `mtx_lock`, `cnd_wait`, etc.) are not supported by ThreadSanitizer, so their use must be prevented in sanitizer builds.
The enforcement is conditional: it only activates in debug or sanitizer builds (controlled by the `DEBUG_OR_SANITIZER_BUILD` preprocessor flag). In release builds, the trap library is compiled as a no-op, imposing zero overhead in production. This ensures that developers catch violations early during development and testing, while production binaries remain unaffected.
Usage
This principle applies whenever a developer writes or modifies C/C++ code in the ClickHouse codebase. Any call to a banned function will be caught automatically during:
- Debug builds: Local development builds compiled with debug flags.
- Sanitizer builds: Builds using AddressSanitizer (ASan), ThreadSanitizer (TSan), MemorySanitizer (MSan), or UndefinedBehaviorSanitizer (UBSan).
- CI pipeline runs: Automated CI builds that compile under debug and sanitizer configurations.
When a violation is detected, the developer must replace the banned function call with a thread-safe or ClickHouse-approved alternative. For example:
- Use `gmtime_r` instead of `gmtime`
- Use `localtime_r` instead of `localtime`
- Use a proper thread-safe random number generator instead of `rand`
- Use `strtok_r` or a C++ string parsing approach instead of `strtok`
- Never use `sleep` in C++ code to address race conditions
Theoretical Basis
The banned function enforcement mechanism relies on the principle of symbol interposition at link time. In the C/C++ compilation model, function names are resolved to addresses during the linking phase. By providing a translation unit that redefines a function symbol with the same name as a libc function, the linker preferentially binds calls to the local (trap) definition rather than the system library definition.
The theoretical model works as follows:
1. Conditional compilation gate: A preprocessor guard checks whether the build is a debug or sanitizer build. If the condition is false, the entire trap library compiles to an empty translation unit.
2. Function redeclaration: For each banned function, a new function definition is emitted with the same name and a void return type. Because the C language allows function redeclaration with compatible signatures at link time (and the compiler diagnostic for incompatible library redeclaration is suppressed), this replacement is transparent to callers.
3. Trap behavior: Each replacement function performs exactly two operations:
- Writes the function name to standard error (file descriptor 2) so the developer can identify which banned function was called.
- Executes an architecture-specific trap instruction that immediately terminates the program with a signal (typically SIGILL or SIGTRAP), producing a core dump for debugging.
4. Platform constraints: The trap library is linked only on specific platform configurations (AMD64 architecture on Linux, excluding Android) to avoid compatibility issues with platforms where symbol interposition may behave differently.
5. Selective exceptions: Some functions that are technically non-thread-safe are commented out in the trap list because they are used by essential third-party libraries (e.g., `strerror` by RocksDB, `mbtowc` by the C++ standard library, `setlocale` by replxx, `mallopt` by TSan). This pragmatic approach balances safety with real-world library dependencies.
The overall effect is a fail-fast discipline: rather than allowing unsafe function calls to silently produce incorrect behavior (which may only manifest under specific concurrency conditions or in production), the system forces immediate and visible failure during development, drastically reducing the window for such bugs to reach production.