Principle:Vespa engine Vespa Version Preparation
Overview
Version preparation ensures all build artifacts share a consistent version string by replacing snapshot/development versions in Maven POM files with a release version. This is a standard practice in CI/CD pipelines to produce deterministic, versioned builds. In the Vespa build pipeline, version preparation is the first stage executed before any compilation occurs, guaranteeing that every downstream artifact -- JARs, RPMs, and container images -- carries the same version identifier.
Motivation
During development, Maven modules use SNAPSHOT version suffixes to indicate that the code is in flux and not yet released. When a CI/CD pipeline produces a release build, all of these snapshot references must be replaced with the target version number. Without this step, artifacts would carry inconsistent or non-deterministic version strings, making it impossible to trace a deployed binary back to a specific build.
Version preparation addresses three core concerns:
- Reproducibility: A given version string always corresponds to the same set of source code and build inputs.
- Traceability: Operators and developers can map a running binary back to its exact build by inspecting the embedded version.
- Dependency consistency: Internal module references within a multi-module Maven project must all resolve to the same version to avoid classpath conflicts.
How It Works
The version preparation phase operates through two scripts that run sequentially:
Step 1: POM Version Replacement
The script replace-vespa-version-in-poms.sh traverses the entire source tree using find and applies sed substitutions to every pom.xml file. Three distinct patterns are matched and replaced:
| Pattern | Purpose | Example Before | Example After |
|---|---|---|---|
<version>.*SNAPSHOT.*</version> |
Module version | <version>8.999.1-SNAPSHOT</version> |
<version>8.432.17</version>
|
<vespaversion>.*project.version.*</vespaversion> |
Vespa dependency version | <vespaversion>${project.version}</vespaversion> |
<vespaversion>8.432.17</vespaversion>
|
<test-framework.version>.*project.version.*</test-framework.version> |
Test framework version | <test-framework.version>${project.version}</test-framework.version> |
<test-framework.version>8.432.17</test-framework.version>
|
The script follows symbolic links (find -L) to ensure all POM files are discovered, including those in symlinked module directories. It also handles platform differences by detecting whether it is running on macOS (BSD sed) or Linux (GNU sed).
Step 2: Artifact Directory Creation
After POM replacement, the prepare.sh script creates the directory structure needed by subsequent build stages:
$WORKDIR/
artifacts/
$ARCH/
rpms/ # Will hold built RPM packages
maven-repo/ # Will hold Maven repository artifacts
The $ARCH variable (e.g., x86_64 or aarch64) ensures that artifacts for different CPU architectures are kept separate, which is essential for the multi-architecture build matrix.
Environment Variables
| Variable | Description | Example Value |
|---|---|---|
VESPA_VERSION |
The target version string to inject into all POM files | 8.432.17
|
SOURCE_DIR |
Root directory of the Vespa source checkout | /vespa
|
WORKDIR |
Working directory for build outputs and intermediate files | /tmp/vespa-build
|
ARCH |
CPU architecture label for artifact partitioning | x86_64
|
Design Considerations
In-place modification vs. copy-on-write: The Vespa pipeline modifies POM files in-place using sed -i. This is acceptable because CI/CD builds operate on a fresh checkout that is discarded after the build. An alternative approach would be to copy the source tree and modify the copy, but this would double disk usage for a large repository like Vespa (thousands of POM files).
Fail-fast behavior: Both scripts use set -o errexit, set -o nounset, and set -o pipefail to terminate immediately on any error. This is critical for version preparation because a partial replacement could produce a build with mixed versions -- a failure mode that would be difficult to diagnose downstream.
Idempotency: The sed patterns are designed so that running the replacement twice with the same version produces the same result. The regex .*SNAPSHOT.* is broad enough to match any existing version string containing "SNAPSHOT", and the replacement is a literal version string.
Relationship to Other Build Stages
Version preparation must complete before any other build stage begins. The pipeline dependency graph is:
Version Preparation
|
+-- Java Bootstrap and Maven Build
|
+-- CMake Configuration --> C++ Compilation
|
+-- RPM Package Creation --> Container Image Building --> Artifact Signing and Publishing
See Also
- Prepare.sh Implementation -- The implementation details of the prepare.sh and replace-vespa-version-in-poms.sh scripts.
- Java Bootstrap and Maven Build -- The next stage in the pipeline that consumes the version-prepared POM files.
- Source: prepare.sh
- Source: replace-vespa-version-in-poms.sh