Principle:Vespa engine Vespa Java Bootstrap and Maven Build
Overview
Java bootstrap and Maven build compiles all Java modules, resolves dependencies from Maven Central, and produces JAR artifacts. The bootstrap phase sets up the Maven wrapper and builds dependency-version POMs first, while the main build uses parallel compilation threads. In the Vespa build pipeline, this stage produces the Java artifacts that are later packaged into RPMs and container images.
Motivation
Vespa is a large-scale distributed system with hundreds of Java modules organized in a multi-module Maven project. Compiling this codebase requires careful orchestration:
- Plugin resolution order: Maven cannot resolve references to a plugin if the same reactor build also builds that plugin. Therefore, plugins must be built first in a separate pass.
- Dependency hierarchy: Parent POMs and dependency-version POMs must be installed into the local repository before child modules can resolve their dependencies.
- Build performance: With hundreds of modules, sequential compilation would be prohibitively slow. Parallel thread pools are essential to achieve reasonable build times.
- C++ test support: Some C++ unit tests require Java JAR files. The bootstrap phase collects these JARs into a dedicated directory.
How It Works
The Java build proceeds in two distinct phases: bootstrap and main build.
Phase 1: Bootstrap
The bootstrap phase is handled by the root-level bootstrap.sh script, which supports multiple modes:
| Mode | Description | When Used |
|---|---|---|
wrapper |
Only set up the Maven wrapper | Minimal setup |
java |
Build only Maven plugins | Plugin-only builds |
default |
Build plugins and minimal modules needed by CMake | Standard C++ builds |
full |
Build plugins and all modules needed by C++ tests | Full CI builds |
The bootstrap sequence for the full mode is:
- Maven wrapper setup: Installs the Maven wrapper (
mvnw) using Maven 3.9.12. This ensures all builds use the same Maven version regardless of what is installed on the build host. - Parent POM installation: Builds and installs
dependency-versions,container-dependency-versions, andparentPOMs. - Root POM installation: Installs the root POM (
-Nflag for non-recursive). - Plugin build: Builds all custom Maven plugins under
maven-plugins/. - C++ test dependencies: Builds
jrt,linguistics, andmessagebusmodules with tests and Javadoc skipped for speed.
Phase 2: Main Maven Build
The main build is handled by .buildkite/java.sh, which invokes the Maven wrapper with parallel threads:
./mvnw -T "$NUM_MVN_THREADS" "${MVN_EXTRA_OPTS[@]}" "$VESPA_MAVEN_TARGET"
The -T flag controls Maven's parallel thread pool. Each thread independently builds a module and its transitive dependencies, with Maven managing the dependency graph to ensure correct ordering.
The mvn_install Function
Both phases use a shared mvn_install function that provides consistent Maven invocation:
mvn_install() {
${MAVEN_CMD} --batch-mode --no-snapshot-updates \
-Dmaven.wagon.http.retryHandler.count=5 \
clean "${MAVEN_TARGET}" ${MAVEN_EXTRA_OPTS} "$@"
}
Key flags:
--batch-mode: Disables interactive prompts and produces output suitable for CI logs.--no-snapshot-updates: Prevents Maven from checking remote repositories for updated SNAPSHOT artifacts, since the version preparation stage already replaced all SNAPSHOTs.-Dmaven.wagon.http.retryHandler.count=5: Retries failed HTTP requests up to 5 times, improving resilience against transient network failures when downloading dependencies.
Environment Variables
| Variable | Description | Example Value |
|---|---|---|
SOURCE_DIR |
Root directory of the Vespa source checkout | /vespa
|
VESPA_CPP_TEST_JARS |
Directory where JAR files for C++ tests are collected | /tmp/vespa-build/test-jars
|
NUM_MVN_THREADS |
Number of parallel Maven threads | 4
|
VESPA_MAVEN_TARGET |
Maven lifecycle target to execute | install
|
VESPA_MAVEN_EXTRA_OPTS |
Additional Maven options (e.g., -Dmaven.test.skip=true) |
-Dmaven.test.skip=true
|
VESPA_MAVEN_COMMAND |
Override for the Maven command (defaults to ./mvnw) |
./mvnw
|
MAVEN_OPTS |
JVM options for the Maven process | -Xms256m -Xmx2g
|
Design Considerations
Two-phase build: The separation into bootstrap and main build is a deliberate design choice forced by Maven's inability to resolve plugin references within a single reactor build. This is a well-known limitation of Maven that affects any project with custom plugins.
GCC toolset activation: The Java build sources /etc/profile.d/enable-gcc-toolset.sh before running Maven. This is because some Java modules include JNI (Java Native Interface) code that requires a C++ compiler. The GCC toolset ensures that the correct compiler version is available.
JAR collection for C++ tests: The bootstrap script uses a find-and-xargs pipeline to copy all JAR files from Maven target/ directories into a single flat directory. This allows C++ test binaries to locate Java dependencies without understanding the Maven module structure.
Relationship to Other Build Stages
The Java bootstrap and Maven build stage depends on version preparation completing first. Its outputs feed into:
- C++ Compilation: C++ tests require JAR files collected during bootstrap.
- RPM Package Creation: RPM packages include compiled Java artifacts.
- Container Image Building: Container images include the Maven local repository.
See Also
- Bootstrap and Java Build Implementation -- Detailed script-level documentation.
- Version Preparation -- The preceding pipeline stage.
- Source: .buildkite/bootstrap.sh
- Source: .buildkite/java.sh
- Source: bootstrap.sh