Heuristic:ClickHouse ClickHouse Test Writing Conventions
| Knowledge Sources | |
|---|---|
| Domains | Testing, Code_Quality |
| Last Updated | 2026-02-08 18:00 GMT |
Overview
ClickHouse test writing conventions: prefer new test files over extending existing ones, use `default` for database names in reference files, avoid `no-*` tags unless strictly necessary, never use `sleep` for race conditions, and follow the reference comparison pattern.
Description
ClickHouse's stateless test framework uses a reference comparison approach: each test has a `.sql` file and a corresponding `.reference` file with expected output. The test runner creates temporary databases with random names and normalizes them to `default` before comparison. Tests should be atomic, isolated, and parallelizable by default. These conventions have evolved from years of maintaining a 5,000+ test suite and represent hard-won lessons about test reliability and maintainability.
Usage
Apply this heuristic when writing new tests in `tests/queries/0_stateless/`, fixing failing tests, or reviewing PRs that add tests. Following these conventions prevents flaky tests and CI pipeline issues.
The Insight (Rule of Thumb)
- Action 1 (New Tests): Always create a new `.sql` file for each logical test case instead of appending to existing test files.
- Value: Improves test isolation, makes failures easier to diagnose, and simplifies bisection.
- Trade-off: More files in the test directory, but ClickHouse has 5,000+ tests and this is the established pattern.
- Action 2 (Database Names): Hardcode `default` as the database name in `.reference` files. Never use `${CLICKHOUSE_DATABASE}` or actual random names.
- Value: The test runner automatically normalizes random database names to `default` before comparison. Using variables causes false mismatches.
- Action 3 (No "no-*" Tags): Do not add `no-parallel`, `no-fasttest`, `no-random-settings`, or similar exclusion tags unless strictly necessary.
- Value: Every exclusion tag slows down CI feedback. Tests should work correctly under parallel execution, fast test configurations, and random settings.
- Trade-off: Sometimes a test genuinely requires global state changes and cannot be parallelized, but this should be the exception.
- Action 4 (No Sleep for Race Conditions): Never use `sleep` in C++ code to fix race conditions. Use proper synchronization primitives (mutexes, condition variables, atomics).
- Value: Sleep-based "fixes" are non-deterministic, slow, and mask the actual bug. They make tests flaky.
- Action 5 (Test Review): Existing tests must not be deleted or relaxed in PRs. Only additions are allowed. If a test needs updating due to behavior change, the change must be justified.
- Value: Prevents regression of previously verified behavior.
- Action 6 (Changelog): Write changelog entries for users, not developers. Focus on "what changed for the user" using 1-5 plain English sentences with backticks for SQL constructs.
- Value: Changelog entries are release notes. Users need to understand impact, not implementation details.
Reasoning
New test preference from `.claude/CLAUDE.md`:
When writing tests in tests/queries, prefer adding a new test instead of extending existing ones.
Database name normalization from `.claude/instructions.md:33-36`:
The test runner creates a temporary database with a random name (e.g., test_abc123)
for each test. After test execution, the random database name is replaced with default
in stdout/stderr files before comparison with .reference. This means .reference files
should use default for database names, NOT ${CLICKHOUSE_DATABASE} or the actual random name.
No exclusion tags from `.claude/instructions.md:40`:
When writing tests, do not add "no-*" tags (like "no-parallel") unless strictly necessarily.
No sleep for race conditions from `.claude/CLAUDE.md`:
Never use sleep in C++ code to fix race conditions - this is stupid and not acceptable!
Test protection from `.github/copilot-instructions.md`:
Existing tests must not be deleted or relaxed; only additions allowed.
Changelog guidelines from `docs/changelog_entry_guidelines.md:7-26`:
Focus on "what", "why", "how" from user perspective, not implementation details.
Aim for between 1-5 sentences. Use backticks for settings, functions, SQL, formats.