Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:Apache Spark Spark Submit K8s

From Leeroopedia


Metadata Value
Source Doc: Running on K8s
Domains Kubernetes
Type External Tool Doc
Related Principle:Apache_Spark_K8s_Application_Submission

Overview

External tool documentation for submitting Spark applications to Kubernetes using spark-submit with the k8s:// master URL.

Description

bin/spark-submit with --master k8s://https://<host>:<port> submits applications to Kubernetes. The API server port must always be specified, even if it is the HTTPS port 443. If no HTTP protocol is specified in the URL, it defaults to https.

In cluster mode, a driver pod is created in the Kubernetes cluster; the driver then requests executor pods from the API server. In client mode, the driver runs locally and only executor pods are created in the cluster.

Application JARs that are pre-mounted in the container image use the local:// URI scheme. Remote JARs can be referenced via hdfs://, s3a://, http://, or file:// schemes.

The application name (specified via spark.app.name or --name) must consist of lowercase alphanumeric characters, -, and ., and must start and end with an alphanumeric character. This is required for valid Kubernetes resource naming.

The Kubernetes API server URL can be discovered using:

kubectl cluster-info

Alternatively, kubectl proxy can be used to proxy API server communication through localhost.

Usage

Use after building Docker images and configuring Kubernetes properties. The minimum required configuration is:

  • --master k8s://https://<host>:<port>
  • --deploy-mode cluster (or client)
  • --conf spark.kubernetes.container.image=<image>
  • Application JAR or Python file

Code Reference

Item Reference
Source documentation docs/running-on-kubernetes.md (L83-206)
Command bin/spark-submit --master k8s://https://<api-server>:<port> ...

Command Signature

bin/spark-submit \
  --master k8s://https://<api-server>:<port> \
  --deploy-mode cluster|client \
  --name <app-name> \
  --class <main-class> \
  --conf spark.kubernetes.container.image=<image> \
  --conf spark.executor.instances=<N> \
  [additional --conf flags] \
  <application-jar-or-file>

Inputs and Outputs

Direction Description
Inputs client, --conf spark.kubernetes.container.image (required), application JAR (local:// for container, other schemes for remote)
Outputs Driver pod created (cluster mode) or local driver process (client mode), executor pods spawned, application execution

Examples

Basic Spark Pi in cluster mode

./bin/spark-submit \
  --master k8s://https://apiserver:6443 \
  --deploy-mode cluster \
  --name spark-pi \
  --class org.apache.spark.examples.SparkPi \
  --conf spark.executor.instances=5 \
  --conf spark.kubernetes.container.image=spark:latest \
  local:///opt/spark/examples/jars/spark-examples_2.12-3.5.0.jar

PySpark application

./bin/spark-submit \
  --master k8s://https://apiserver:6443 \
  --deploy-mode cluster \
  --name pyspark-pi \
  --conf spark.kubernetes.container.image=spark-py:latest \
  --conf spark.kubernetes.pyspark.pythonVersion=3 \
  local:///opt/spark/examples/src/main/python/pi.py

Using kubectl proxy

# In a separate terminal:
kubectl proxy

# Then submit using the proxy:
./bin/spark-submit \
  --master k8s://http://127.0.0.1:8001 \
  --deploy-mode cluster \
  --name spark-pi \
  --class org.apache.spark.examples.SparkPi \
  --conf spark.kubernetes.container.image=spark:latest \
  local:///opt/spark/examples/jars/spark-examples_2.12-3.5.0.jar

With custom namespace and service account

./bin/spark-submit \
  --master k8s://https://apiserver:6443 \
  --deploy-mode cluster \
  --name spark-pi \
  --class org.apache.spark.examples.SparkPi \
  --conf spark.kubernetes.container.image=spark:latest \
  --conf spark.kubernetes.namespace=spark \
  --conf spark.kubernetes.authenticate.driver.serviceAccountName=spark-sa \
  local:///opt/spark/examples/jars/spark-examples_2.12-3.5.0.jar

Application with S3 dependency upload

./bin/spark-submit \
  --master k8s://https://apiserver:6443 \
  --deploy-mode cluster \
  --name my-app \
  --packages org.apache.hadoop:hadoop-aws:3.4.1 \
  --conf spark.kubernetes.container.image=spark:latest \
  --conf spark.kubernetes.file.upload.path=s3a://my-bucket/spark-uploads \
  --conf spark.hadoop.fs.s3a.access.key=ACCESSKEY \
  --conf spark.hadoop.fs.s3a.secret.key=SECRETKEY \
  --conf spark.hadoop.fs.s3a.impl=org.apache.hadoop.fs.s3a.S3AFileSystem \
  file:///local/path/to/app.jar

Related

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment