Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:Apache Spark Docker Image Tool

From Leeroopedia


Metadata Value
Source Repo: Apache Spark
Domains Kubernetes, Containerization
Type API Doc
Related Principle:Apache_Spark_Container_Image_Build

Overview

Shell script that builds and pushes Docker images for Spark, PySpark, and SparkR on Kubernetes.

Description

bin/docker-image-tool.sh builds Docker images from the Spark distribution. It supports building a base Spark image, a PySpark image (via the -p flag with a Python Dockerfile), and a SparkR image (via the -R flag). The -X flag enables cross-platform builds using docker buildx, which automatically pushes images as part of the build process.

For development builds (when no RELEASE file is present), the script creates a temporary build context directory at $SPARK_HOME/target/tmp/docker to avoid uploading the entire source tree to the Docker daemon. This context is automatically cleaned up on exit via a trap handler.

The build() function (lines 137-216) performs the following steps:

  1. Optionally creates the dev build context
  2. Verifies the Docker image content directory exists
  3. Verifies Spark JARs are present
  4. Builds the base Spark image
  5. Optionally builds PySpark and SparkR images using the base as a parent
  6. For cross-platform builds, uses docker buildx build with --push

The push() function (lines 218-222) pushes pre-built images for all three image types (spark, spark-py, spark-r).

Usage

Run from SPARK_HOME after building a Spark distribution or from a release download.

Code Reference

Item Reference
Script bin/docker-image-tool.sh (L1-337)
build() function Lines 137-216
push() function Lines 218-222

Signature

bin/docker-image-tool.sh [-r <repo>] [-t <tag>] [-f <dockerfile>] [-p <py-dockerfile>] [-R <r-dockerfile>] [-u <uid>] [-m] [-n] [-X] [-b <build-arg>] build|push

Key Parameters

Flag Parameter Description
-r repo Registry prefix (e.g., docker.io/myrepo)
-t tag Image tag (e.g., v3.5.0)
-f dockerfile Custom base Dockerfile for JVM jobs
-p py-dockerfile PySpark Dockerfile (enables PySpark image build)
-R r-dockerfile SparkR Dockerfile (enables SparkR image build)
-u uid Container UID (default: 185)
-m (none) Use Minikube's Docker daemon
-n (none) Build with --no-cache
-X (none) Cross-platform build via docker buildx (auto-pushes)
-b build-arg Additional Docker build argument

Inputs and Outputs

Direction Description
Inputs Spark distribution, Dockerfiles, Docker daemon
Outputs Docker images: spark:<tag>, spark-py:<tag>, spark-r:<tag>

Examples

Basic build

./bin/docker-image-tool.sh -r myregistry -t v3.5.0 build

Push to registry

./bin/docker-image-tool.sh -r myregistry -t v3.5.0 push

Build with PySpark

./bin/docker-image-tool.sh \
  -r myregistry \
  -t v3.5.0 \
  -p kubernetes/dockerfiles/spark/bindings/python/Dockerfile \
  build

Build in Minikube

eval $(minikube docker-env)
./bin/docker-image-tool.sh -m -t testing build

Cross-platform build

./bin/docker-image-tool.sh -r myregistry -t v3.5.0 -X build
# Note: -X uses buildx which pushes during build; no separate push step needed

Build with Java 17 base

./bin/docker-image-tool.sh -r myregistry -t v3.5.0 -b java_image_tag=17 build

Related

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment