Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Environment:Apache Spark Kubernetes Runtime

From Leeroopedia


Knowledge Sources
Domains Infrastructure, Kubernetes, Container_Orchestration
Last Updated 2026-02-08 22:00 GMT

Overview

Kubernetes 1.33+ cluster with kubectl RBAC permissions, Docker/Buildx for image building, and tini for container process management to run Spark on Kubernetes.

Description

This environment defines the infrastructure requirements for deploying and running Apache Spark applications on Kubernetes. It requires a running Kubernetes cluster with proper RBAC permissions for pod lifecycle management (create, list, edit, delete), a container runtime for building Spark images, and the spark-submit client configured with the `k8s://` master URL. The container images are based on Zulu OpenJDK 21 and include tini as PID 1 for proper signal handling. The kubernetes-client library version 7.5.2 must be compatible with the target cluster version.

Usage

Use this environment for all Kubernetes Deployment workflows, including building container images, submitting Spark applications to Kubernetes, and managing the container lifecycle. It is the mandatory prerequisite for running the Kubectl_Auth_Check, Docker_Image_Tool, K8s_Config_Properties, Spark_Submit_K8s, and K8s_Entrypoint implementations.

System Requirements

Category Requirement Notes
Kubernetes Cluster Version >= 1.33 Check kubernetes-client compatibility
kubectl Matching cluster version Must have RBAC permissions for pods
Docker Docker with Buildx support For multi-platform image builds (linux/amd64, linux/arm64)
Container Base Image azul/zulu-openjdk:21 Default JDK 21 base image
tini Installed in container PID 1 process manager for signal hygiene

Dependencies

System Packages (Build Host)

  • `docker` with Buildx plugin (for multi-arch builds)
  • `kubectl` (matching cluster version)
  • JDK 17+ (for spark-submit client)

Container Dependencies

  • Zulu OpenJDK 21 (default base image)
  • `tini` (PID 1 init system)
  • `bash` (for entrypoint.sh)

Kubernetes RBAC Permissions

The following RBAC permissions are required:

  • `pods`: create, list, get, watch, delete
  • `services`: create, list, get, delete (for headless services)
  • `configmaps`: create, list, get, delete (for executor configs)

Library Version

  • kubernetes-client 7.5.2 (from pom.xml)

Credentials

The following must be configured for Kubernetes deployment:

  • KUBECONFIG: Path to kubectl configuration file (or default `~/.kube/config`)
  • Kubernetes ServiceAccount: With appropriate RBAC roles bound
  • Container Registry Credentials: For pushing/pulling Spark images (if using private registry)

Quick Install

# Build Spark Docker images
./bin/docker-image-tool.sh -r <registry> -t <tag> build

# Push images to registry
./bin/docker-image-tool.sh -r <registry> -t <tag> push

# Verify kubectl access
kubectl auth can-i create pods
kubectl auth can-i list pods
kubectl auth can-i delete pods

# Submit to Kubernetes
./bin/spark-submit \
  --master k8s://https://<k8s-apiserver>:443 \
  --deploy-mode cluster \
  --conf spark.kubernetes.container.image=<image> \
  local:///opt/spark/examples/jars/spark-examples.jar

Code Evidence

Container entrypoint anonymous UID handling from `resource-managers/kubernetes/docker/src/main/dockerfiles/spark/entrypoint.sh:22-37`:

myuid=$(id -u)
mygid=$(id -g)
set +e
uidentry=$(getent passwd $myuid)
set -e

if [ -z "$uidentry" ] ; then
    if [ -w /etc/passwd ] ; then
        echo "$myuid:x:$myuid:$mygid:${SPARK_USER_NAME:-anonymous uid}:$SPARK_HOME:/bin/false" >> /etc/passwd
    else
        echo "Container ENTRYPOINT failed to add passwd entry for anonymous UID"
    fi
fi

JAVA_HOME auto-detection in container from `entrypoint.sh:39-41`:

if [ -z "$JAVA_HOME" ]; then
  JAVA_HOME=$(java -XshowSettings:properties -version 2>&1 > /dev/null | grep 'java.home' | awk '{print $3}')
fi

Process management with tini from `entrypoint.sh:118`:

exec /usr/bin/tini -s -- "${CMD[@]}"

Classpath construction with SPARK-43540 fix from `entrypoint.sh:78-79`:

# SPARK-43540: add current working directory into executor classpath
SPARK_CLASSPATH="$SPARK_CLASSPATH:$PWD"

Multi-platform Docker build support from `bin/docker-image-tool.sh:175`:

local ARCHS=${ARCHS:-"--platform linux/amd64,linux/arm64"}

Common Errors

Error Message Cause Solution
`Container ENTRYPOINT failed to add passwd entry for anonymous UID` /etc/passwd is read-only in container Ensure container securityContext allows writing to /etc/passwd
`kubectl auth can-i` returns `no` Missing RBAC permissions Apply spark-rbac.yaml or create appropriate ClusterRoleBinding
Pod stuck in `ImagePullBackOff` Container image not accessible Push image to accessible registry or configure imagePullSecrets
Executor pod fails to connect to driver Network policy or firewall blocking Ensure pods can communicate within the namespace

Compatibility Notes

  • Kubernetes 1.33+: Minimum supported version. Check kubernetes-client 7.5.2 compatibility with your cluster version.
  • Multi-arch Images: Default builds for both linux/amd64 and linux/arm64 via Docker Buildx.
  • OpenShift: May run pods with arbitrary UIDs; the entrypoint handles this gracefully.
  • Volcano Scheduler: Optional integration for advanced scheduling (PodGroup templates).
  • Default UID: Spark containers run as UID 185 by default (configurable via `spark_uid` build arg).

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment