Principle:Vespa engine Vespa CMake Configuration
Overview
CMake configuration generates a platform-specific build system from declarative CMakeLists.txt files. It resolves compiler toolchains, library dependencies, and build options. In the Vespa CI/CD pipeline, CMake configuration is the bridge between the Java bootstrap phase and C++ compilation, translating high-level build declarations into Makefiles that make can execute.
Motivation
Vespa's C++ codebase consists of hundreds of libraries and executables that depend on both internal and external libraries. Manually writing Makefiles for this scale of project would be unmaintainable. CMake provides:
- Declarative dependency management: Each module declares its dependencies in CMakeLists.txt, and CMake resolves the full transitive dependency graph.
- Platform abstraction: CMake detects the compiler, linker, and system libraries automatically, allowing the same build definitions to work across different Linux distributions.
- Build option configuration: Options such as sanitizer support, ccache integration, and Valgrind testing can be toggled through CMake variables without modifying build files.
How It Works
CMake Invocation
The CMake configuration step is a single invocation that processes the top-level CMakeLists.txt and recursively processes all subdirectories:
cmake -DVESPA_UNPRIVILEGED=no \
-DVALGRIND_UNIT_TESTS="$VALGRIND_UNIT_TESTS" \
"$VESPA_CMAKE_SANITIZERS_OPTION" \
"$VESPA_CMAKE_CCACHE_OPTION" \
"$SOURCE_DIR"
This invocation runs from a separate build directory (out-of-source build), keeping generated files separate from the source tree.
Key CMake Variables
| Variable | Type | Description | Default |
|---|---|---|---|
VESPA_UNPRIVILEGED |
Boolean | Whether to build for unprivileged (non-root) installation | no
|
VALGRIND_UNIT_TESTS |
Boolean | Whether to run unit tests under Valgrind for memory checking | true (unless PR or sanitizer build)
|
VESPA_USE_SANITIZER |
String | Address/thread/undefined behavior sanitizer to enable | null (none)
|
VESPA_USE_CCACHE |
Boolean | Whether to use ccache for compilation caching | true (unless sanitizer is active)
|
Sanitizer and Ccache Interaction
When a sanitizer is enabled, ccache is automatically disabled. This is because sanitizer builds produce different object code that should not pollute the ccache, and sanitizer-instrumented binaries may have different cache invalidation characteristics. The logic is:
- If
VESPA_USE_SANITIZERis set to a value other thannull, enable the sanitizer CMake option and force ccache off. - If the build is for a pull request (
BUILDKITE_PULL_REQUEST != "false"), disable Valgrind tests to speed up the PR validation cycle.
Valgrind Unit Tests
Valgrind is a memory error detector that catches use-after-free, buffer overflows, and memory leaks at runtime. Enabling Valgrind for unit tests significantly increases test execution time (typically 10-50x slower), so it is conditionally disabled for:
- Pull request builds: Fast feedback is more valuable than exhaustive memory checking.
- Sanitizer builds: Address Sanitizer (ASan) and Thread Sanitizer (TSan) provide similar memory checking with less overhead, making Valgrind redundant.
Environment Variables
| Variable | Description | Example Value |
|---|---|---|
VESPA_USE_SANITIZER |
Sanitizer type to enable, or null to disable |
address, thread, or null
|
BUILDKITE_PULL_REQUEST |
Pull request number, or "false" if not a PR build |
35801 or false
|
VALGRIND_UNIT_TESTS |
Whether to enable Valgrind for unit tests | true or false
|
SOURCE_DIR |
Root directory of the Vespa source checkout | /vespa
|
Design Considerations
Out-of-source builds: CMake is invoked from a build directory that is separate from the source tree. This keeps generated Makefiles, object files, and binaries out of the source directory, allowing the same source tree to support multiple build configurations simultaneously.
GCC toolset activation: The configuration script sources /etc/profile.d/enable-gcc-toolset.sh to ensure the correct GCC version is available. Vespa requires C++20 support, which necessitates GCC 12 or later (or Clang 16+).
Dependency path setup: The script prepends /opt/vespa-deps/bin to the PATH. This directory contains pre-built Vespa dependency binaries (e.g., protobuf, abseil, gRPC) that CMake's find_package must locate during configuration.
What CMake Produces
After configuration completes, the build directory contains:
- Makefiles: Generated build rules for every target in the project.
- CMakeCache.txt: A cache file recording all resolved variables and paths.
- cmake_install.cmake: Installation rules for
make install. - config.h / vespa-config.h: Generated header files with platform-specific defines.
These files are consumed by the subsequent C++ compilation stage, which runs make -j against the generated Makefiles.
Relationship to Other Build Stages
CMake configuration depends on the Java bootstrap completing first (because some CMakeLists.txt files reference Java JAR paths). Its output feeds directly into C++ compilation:
Java Bootstrap --> CMake Configuration --> C++ Compilation
|
v
RPM Package Creation
See Also
- CMake Configuration Implementation -- The implementation details of the bootstrap-cmake.sh script.
- C++ Compilation -- The next stage that consumes the generated Makefiles.
- Java Bootstrap and Maven Build -- The preceding stage that produces JAR dependencies.
- Source: bootstrap-cmake.sh