Implementation:Apache Spark Docker Image Tool
| Metadata | Value |
|---|---|
| Source | Repo: Apache Spark |
| Domains | Kubernetes, Containerization |
| Type | API Doc |
| Related | Principle:Apache_Spark_Container_Image_Build |
Overview
Shell script that builds and pushes Docker images for Spark, PySpark, and SparkR on Kubernetes.
Description
bin/docker-image-tool.sh builds Docker images from the Spark distribution. It supports building a base Spark image, a PySpark image (via the -p flag with a Python Dockerfile), and a SparkR image (via the -R flag). The -X flag enables cross-platform builds using docker buildx, which automatically pushes images as part of the build process.
For development builds (when no RELEASE file is present), the script creates a temporary build context directory at $SPARK_HOME/target/tmp/docker to avoid uploading the entire source tree to the Docker daemon. This context is automatically cleaned up on exit via a trap handler.
The build() function (lines 137-216) performs the following steps:
- Optionally creates the dev build context
- Verifies the Docker image content directory exists
- Verifies Spark JARs are present
- Builds the base Spark image
- Optionally builds PySpark and SparkR images using the base as a parent
- For cross-platform builds, uses
docker buildx buildwith--push
The push() function (lines 218-222) pushes pre-built images for all three image types (spark, spark-py, spark-r).
Usage
Run from SPARK_HOME after building a Spark distribution or from a release download.
Code Reference
| Item | Reference |
|---|---|
| Script | bin/docker-image-tool.sh (L1-337)
|
| build() function | Lines 137-216 |
| push() function | Lines 218-222 |
Signature
bin/docker-image-tool.sh [-r <repo>] [-t <tag>] [-f <dockerfile>] [-p <py-dockerfile>] [-R <r-dockerfile>] [-u <uid>] [-m] [-n] [-X] [-b <build-arg>] build|push
Key Parameters
| Flag | Parameter | Description |
|---|---|---|
-r |
repo | Registry prefix (e.g., docker.io/myrepo)
|
-t |
tag | Image tag (e.g., v3.5.0)
|
-f |
dockerfile | Custom base Dockerfile for JVM jobs |
-p |
py-dockerfile | PySpark Dockerfile (enables PySpark image build) |
-R |
r-dockerfile | SparkR Dockerfile (enables SparkR image build) |
-u |
uid | Container UID (default: 185) |
-m |
(none) | Use Minikube's Docker daemon |
-n |
(none) | Build with --no-cache
|
-X |
(none) | Cross-platform build via docker buildx (auto-pushes)
|
-b |
build-arg | Additional Docker build argument |
Inputs and Outputs
| Direction | Description |
|---|---|
| Inputs | Spark distribution, Dockerfiles, Docker daemon |
| Outputs | Docker images: spark:<tag>, spark-py:<tag>, spark-r:<tag>
|
Examples
Basic build
./bin/docker-image-tool.sh -r myregistry -t v3.5.0 build
Push to registry
./bin/docker-image-tool.sh -r myregistry -t v3.5.0 push
Build with PySpark
./bin/docker-image-tool.sh \
-r myregistry \
-t v3.5.0 \
-p kubernetes/dockerfiles/spark/bindings/python/Dockerfile \
build
Build in Minikube
eval $(minikube docker-env)
./bin/docker-image-tool.sh -m -t testing build
Cross-platform build
./bin/docker-image-tool.sh -r myregistry -t v3.5.0 -X build
# Note: -X uses buildx which pushes during build; no separate push step needed
Build with Java 17 base
./bin/docker-image-tool.sh -r myregistry -t v3.5.0 -b java_image_tag=17 build