Environment:Apache Spark Kubernetes Runtime
| Knowledge Sources | |
|---|---|
| Domains | Infrastructure, Kubernetes, Container_Orchestration |
| Last Updated | 2026-02-08 22:00 GMT |
Overview
Kubernetes 1.33+ cluster with kubectl RBAC permissions, Docker/Buildx for image building, and tini for container process management to run Spark on Kubernetes.
Description
This environment defines the infrastructure requirements for deploying and running Apache Spark applications on Kubernetes. It requires a running Kubernetes cluster with proper RBAC permissions for pod lifecycle management (create, list, edit, delete), a container runtime for building Spark images, and the spark-submit client configured with the `k8s://` master URL. The container images are based on Zulu OpenJDK 21 and include tini as PID 1 for proper signal handling. The kubernetes-client library version 7.5.2 must be compatible with the target cluster version.
Usage
Use this environment for all Kubernetes Deployment workflows, including building container images, submitting Spark applications to Kubernetes, and managing the container lifecycle. It is the mandatory prerequisite for running the Kubectl_Auth_Check, Docker_Image_Tool, K8s_Config_Properties, Spark_Submit_K8s, and K8s_Entrypoint implementations.
System Requirements
| Category | Requirement | Notes |
|---|---|---|
| Kubernetes Cluster | Version >= 1.33 | Check kubernetes-client compatibility |
| kubectl | Matching cluster version | Must have RBAC permissions for pods |
| Docker | Docker with Buildx support | For multi-platform image builds (linux/amd64, linux/arm64) |
| Container Base Image | azul/zulu-openjdk:21 | Default JDK 21 base image |
| tini | Installed in container | PID 1 process manager for signal hygiene |
Dependencies
System Packages (Build Host)
- `docker` with Buildx plugin (for multi-arch builds)
- `kubectl` (matching cluster version)
- JDK 17+ (for spark-submit client)
Container Dependencies
- Zulu OpenJDK 21 (default base image)
- `tini` (PID 1 init system)
- `bash` (for entrypoint.sh)
Kubernetes RBAC Permissions
The following RBAC permissions are required:
- `pods`: create, list, get, watch, delete
- `services`: create, list, get, delete (for headless services)
- `configmaps`: create, list, get, delete (for executor configs)
Library Version
- kubernetes-client 7.5.2 (from pom.xml)
Credentials
The following must be configured for Kubernetes deployment:
- KUBECONFIG: Path to kubectl configuration file (or default `~/.kube/config`)
- Kubernetes ServiceAccount: With appropriate RBAC roles bound
- Container Registry Credentials: For pushing/pulling Spark images (if using private registry)
Quick Install
# Build Spark Docker images
./bin/docker-image-tool.sh -r <registry> -t <tag> build
# Push images to registry
./bin/docker-image-tool.sh -r <registry> -t <tag> push
# Verify kubectl access
kubectl auth can-i create pods
kubectl auth can-i list pods
kubectl auth can-i delete pods
# Submit to Kubernetes
./bin/spark-submit \
--master k8s://https://<k8s-apiserver>:443 \
--deploy-mode cluster \
--conf spark.kubernetes.container.image=<image> \
local:///opt/spark/examples/jars/spark-examples.jar
Code Evidence
Container entrypoint anonymous UID handling from `resource-managers/kubernetes/docker/src/main/dockerfiles/spark/entrypoint.sh:22-37`:
myuid=$(id -u)
mygid=$(id -g)
set +e
uidentry=$(getent passwd $myuid)
set -e
if [ -z "$uidentry" ] ; then
if [ -w /etc/passwd ] ; then
echo "$myuid:x:$myuid:$mygid:${SPARK_USER_NAME:-anonymous uid}:$SPARK_HOME:/bin/false" >> /etc/passwd
else
echo "Container ENTRYPOINT failed to add passwd entry for anonymous UID"
fi
fi
JAVA_HOME auto-detection in container from `entrypoint.sh:39-41`:
if [ -z "$JAVA_HOME" ]; then
JAVA_HOME=$(java -XshowSettings:properties -version 2>&1 > /dev/null | grep 'java.home' | awk '{print $3}')
fi
Process management with tini from `entrypoint.sh:118`:
exec /usr/bin/tini -s -- "${CMD[@]}"
Classpath construction with SPARK-43540 fix from `entrypoint.sh:78-79`:
# SPARK-43540: add current working directory into executor classpath
SPARK_CLASSPATH="$SPARK_CLASSPATH:$PWD"
Multi-platform Docker build support from `bin/docker-image-tool.sh:175`:
local ARCHS=${ARCHS:-"--platform linux/amd64,linux/arm64"}
Common Errors
| Error Message | Cause | Solution |
|---|---|---|
| `Container ENTRYPOINT failed to add passwd entry for anonymous UID` | /etc/passwd is read-only in container | Ensure container securityContext allows writing to /etc/passwd |
| `kubectl auth can-i` returns `no` | Missing RBAC permissions | Apply spark-rbac.yaml or create appropriate ClusterRoleBinding |
| Pod stuck in `ImagePullBackOff` | Container image not accessible | Push image to accessible registry or configure imagePullSecrets |
| Executor pod fails to connect to driver | Network policy or firewall blocking | Ensure pods can communicate within the namespace |
Compatibility Notes
- Kubernetes 1.33+: Minimum supported version. Check kubernetes-client 7.5.2 compatibility with your cluster version.
- Multi-arch Images: Default builds for both linux/amd64 and linux/arm64 via Docker Buildx.
- OpenShift: May run pods with arbitrary UIDs; the entrypoint handles this gracefully.
- Volcano Scheduler: Optional integration for advanced scheduling (PodGroup templates).
- Default UID: Spark containers run as UID 185 by default (configurable via `spark_uid` build arg).