Implementation:Apache Spark Spark Submit K8s
| Metadata | Value |
|---|---|
| Source | Doc: Running on K8s |
| Domains | Kubernetes |
| Type | External Tool Doc |
| Related | Principle:Apache_Spark_K8s_Application_Submission |
Overview
External tool documentation for submitting Spark applications to Kubernetes using spark-submit with the k8s:// master URL.
Description
bin/spark-submit with --master k8s://https://<host>:<port> submits applications to Kubernetes. The API server port must always be specified, even if it is the HTTPS port 443. If no HTTP protocol is specified in the URL, it defaults to https.
In cluster mode, a driver pod is created in the Kubernetes cluster; the driver then requests executor pods from the API server. In client mode, the driver runs locally and only executor pods are created in the cluster.
Application JARs that are pre-mounted in the container image use the local:// URI scheme. Remote JARs can be referenced via hdfs://, s3a://, http://, or file:// schemes.
The application name (specified via spark.app.name or --name) must consist of lowercase alphanumeric characters, -, and ., and must start and end with an alphanumeric character. This is required for valid Kubernetes resource naming.
The Kubernetes API server URL can be discovered using:
kubectl cluster-info
Alternatively, kubectl proxy can be used to proxy API server communication through localhost.
Usage
Use after building Docker images and configuring Kubernetes properties. The minimum required configuration is:
--master k8s://https://<host>:<port>--deploy-mode cluster(orclient)--conf spark.kubernetes.container.image=<image>- Application JAR or Python file
Code Reference
| Item | Reference |
|---|---|
| Source documentation | docs/running-on-kubernetes.md (L83-206)
|
| Command | bin/spark-submit --master k8s://https://<api-server>:<port> ...
|
Command Signature
bin/spark-submit \
--master k8s://https://<api-server>:<port> \
--deploy-mode cluster|client \
--name <app-name> \
--class <main-class> \
--conf spark.kubernetes.container.image=<image> \
--conf spark.executor.instances=<N> \
[additional --conf flags] \
<application-jar-or-file>
Inputs and Outputs
| Direction | Description |
|---|---|
| Inputs | client, --conf spark.kubernetes.container.image (required), application JAR (local:// for container, other schemes for remote)
|
| Outputs | Driver pod created (cluster mode) or local driver process (client mode), executor pods spawned, application execution |
Examples
Basic Spark Pi in cluster mode
./bin/spark-submit \
--master k8s://https://apiserver:6443 \
--deploy-mode cluster \
--name spark-pi \
--class org.apache.spark.examples.SparkPi \
--conf spark.executor.instances=5 \
--conf spark.kubernetes.container.image=spark:latest \
local:///opt/spark/examples/jars/spark-examples_2.12-3.5.0.jar
PySpark application
./bin/spark-submit \
--master k8s://https://apiserver:6443 \
--deploy-mode cluster \
--name pyspark-pi \
--conf spark.kubernetes.container.image=spark-py:latest \
--conf spark.kubernetes.pyspark.pythonVersion=3 \
local:///opt/spark/examples/src/main/python/pi.py
Using kubectl proxy
# In a separate terminal:
kubectl proxy
# Then submit using the proxy:
./bin/spark-submit \
--master k8s://http://127.0.0.1:8001 \
--deploy-mode cluster \
--name spark-pi \
--class org.apache.spark.examples.SparkPi \
--conf spark.kubernetes.container.image=spark:latest \
local:///opt/spark/examples/jars/spark-examples_2.12-3.5.0.jar
With custom namespace and service account
./bin/spark-submit \
--master k8s://https://apiserver:6443 \
--deploy-mode cluster \
--name spark-pi \
--class org.apache.spark.examples.SparkPi \
--conf spark.kubernetes.container.image=spark:latest \
--conf spark.kubernetes.namespace=spark \
--conf spark.kubernetes.authenticate.driver.serviceAccountName=spark-sa \
local:///opt/spark/examples/jars/spark-examples_2.12-3.5.0.jar
Application with S3 dependency upload
./bin/spark-submit \
--master k8s://https://apiserver:6443 \
--deploy-mode cluster \
--name my-app \
--packages org.apache.hadoop:hadoop-aws:3.4.1 \
--conf spark.kubernetes.container.image=spark:latest \
--conf spark.kubernetes.file.upload.path=s3a://my-bucket/spark-uploads \
--conf spark.hadoop.fs.s3a.access.key=ACCESSKEY \
--conf spark.hadoop.fs.s3a.secret.key=SECRETKEY \
--conf spark.hadoop.fs.s3a.impl=org.apache.hadoop.fs.s3a.S3AFileSystem \
file:///local/path/to/app.jar