Principle:ClickHouse ClickHouse CI Pipeline Execution
| Knowledge Sources | |
|---|---|
| Domains | Development_Process, Code_Quality |
| Last Updated | 2026-02-08 00:00 GMT |
Overview
A comprehensive CI/CD pipeline validates code changes across multiple build configurations and test suites using a custom Python-based CI framework, ensuring correctness, performance, and compatibility before merge.
Description
The ClickHouse CI pipeline is a multi-stage validation system that automatically builds, tests, and reports on every pull request. Unlike simple CI setups that run a single build and test suite, ClickHouse's pipeline compiles the project under multiple configurations and runs a diverse collection of test suites against each, reflecting the project's need to guarantee correctness across different runtime environments, sanitizer modes, and optimization levels.
The pipeline is orchestrated by Praktika, a custom Python-based CI framework developed by the ClickHouse team. Praktika provides generic CI functionality (workflow orchestration, job scheduling, artifact management, reporting) while allowing project-specific customization through a settings override mechanism. This design separates reusable CI infrastructure from ClickHouse-specific logic.
The CI pipeline validates changes across several dimensions:
Build configurations:
- Release: The production-optimized build, testing real-world performance characteristics and behavior.
- Debug: A build with assertions enabled, debug symbols, and the banned function enforcement library linked, catching logic errors and forbidden API usage.
- ASan (AddressSanitizer): Detects memory errors including buffer overflows, use-after-free, and memory leaks.
- TSan (ThreadSanitizer): Detects data races, deadlocks, and other concurrency issues in the multi-threaded codebase.
- MSan (MemorySanitizer): Detects use of uninitialized memory.
- UBSan (UndefinedBehaviorSanitizer): Detects undefined behavior in C/C++ code.
Test suites:
- Stateless tests: Thousands of SQL-based functional tests that verify individual query behaviors and features.
- Stateful tests: Tests that operate on persistent datasets to verify data integrity across operations.
- Integration tests: Docker-based tests that verify ClickHouse's interaction with external systems (Kafka, S3, MySQL, PostgreSQL, etc.).
- Unit tests: C++ unit tests for individual components.
- Performance tests: Benchmarks that detect performance regressions by comparing query execution times against baselines.
- Stress tests: High-concurrency and high-load tests designed to expose race conditions and resource exhaustion issues.
Reporting: The CI system produces detailed reports accessible via Praktika report links posted as comments on pull requests. These reports include per-test pass/fail status, logs, and artifact links. A CI robot posts summary comments with links to the reports, making it easy for contributors and reviewers to assess the state of a pull request.
Usage
The CI pipeline executes automatically on every pull request to the ClickHouse repository. Contributors and reviewers interact with it in the following ways:
- Contributors: Push code to a feature branch and create a pull request. The CI pipeline starts automatically. Monitor the CI status through the robot's comment with Praktika report links. Fix any failures and push new commits (never amend or rebase).
- Reviewers: Check the CI status before approving. Pay attention to the Praktika reports first (they contain more information than GitHub Actions logs). Look for failures in sanitizer builds, which may indicate subtle bugs not caught by the Release build.
- Local validation: Contributors can run individual test suites locally before pushing, using `python -m ci.praktika run "integration" --test <selectors>` for integration tests, or `./tests/clickhouse-test <test_name>` for stateless tests.
Theoretical Basis
The ClickHouse CI pipeline is designed around the principle of multi-dimensional validation: the idea that a single build-and-test pass is insufficient for a complex C/C++ database engine, and that changes must be validated across multiple independent axes to achieve high confidence in correctness.
1. Build configuration matrix: Each build configuration exposes a different class of bugs:
- Debug builds with assertions catch logical invariant violations that are silently ignored in release builds.
- ASan builds catch memory safety violations that may not manifest as visible bugs in normal execution but can cause data corruption or security vulnerabilities.
- TSan builds catch concurrency bugs that are non-deterministic and may only manifest under specific thread scheduling conditions.
- The combination of these configurations provides overlapping but complementary coverage.
2. Test suite diversity: Different test suites target different abstraction levels:
- Unit tests verify individual function and class behavior in isolation.
- Stateless SQL tests verify query-level semantics and feature correctness.
- Stateful tests verify data persistence and consistency guarantees.
- Integration tests verify system-level behavior including interactions with external dependencies.
- Performance tests guard against regressions that affect user-visible query latency and throughput.
- Stress tests explore the state space under high concurrency, increasing the probability of exposing race conditions.
3. Fail-fast with comprehensive reporting: The CI system is designed to fail fast on obvious errors (compilation failures, basic test failures) while still running the full suite to provide comprehensive feedback. The Praktika framework generates detailed reports with per-test results, logs, and artifact links, enabling efficient diagnosis of failures without requiring local reproduction.
4. Separation of concerns in CI architecture: The Praktika framework separates generic CI functionality (workflow orchestration, job scheduling, reporting) from project-specific logic (build configurations, test suites, Docker images). This separation is achieved through:
- A core framework module (`ci/praktika/`) that provides reusable CI primitives.
- Workflow definitions (`ci/workflows/`) that declare the structure of CI pipelines.
- Project-specific settings (`ci/settings/`) that override framework defaults.
- Individual job scripts (`ci/jobs/`) that implement specific build and test tasks.
- Docker configurations (`ci/docker/`) that define reproducible build and test environments.
5. Reproducibility through containerization: All CI jobs run inside Docker containers defined in the repository. This ensures that CI results are reproducible regardless of the underlying CI infrastructure, and that contributors can reproduce failures locally using the same Docker images.