Principle:Apache Spark K8s Application Submission

Metadata	Value
Domains	Kubernetes, Deployment
Type	Principle
Related	Implementation:Apache_Spark_Spark_Submit_K8s

Overview

A container-orchestrated application submission pattern that creates driver and executor pods in Kubernetes using the k8s:// master URL protocol.

Description

Spark on Kubernetes submission creates the driver as a Kubernetes pod, which then requests executor pods from the Kubernetes API server. The k8s:// master URL scheme tells Spark to use the Kubernetes cluster manager instead of Standalone, YARN, or Mesos.

The submission workflow operates as follows:

The submission client (spark-submit) contacts the Kubernetes API server at the URL specified by --master k8s://https://<host>:<port>.
In cluster mode, the submission client creates a driver pod in the Kubernetes cluster. The driver process runs inside this pod.
The driver pod requests executor pods from the Kubernetes API server based on spark.executor.instances.
Executors register with the driver and begin executing tasks.
When the application completes, executor pods are terminated and cleaned up. The driver pod remains in "completed" state for log inspection until it is garbage collected or manually deleted.

Two deployment modes are supported:

Cluster mode (recommended for production) -- The driver runs as a Kubernetes pod. The submission client exits after creating the driver pod.
Client mode (useful for debugging) -- The driver runs locally on the submission machine. Only executors are created as Kubernetes pods. The driver must be network-reachable from executor pods.

The local:// JAR scheme is a Kubernetes-specific feature that references JARs already present inside the container image, avoiding the need to transfer JARs at submission time.

Usage

Use to submit Spark applications to Kubernetes clusters:

Production workloads -- Use cluster mode so the driver lifecycle is managed by Kubernetes.
Interactive debugging -- Use client mode to run the driver locally with direct access to driver logs and the Spark UI.
Pre-packaged applications -- Use local:// URIs to reference application JARs baked into the container image.
Remote dependencies -- Use hdfs://, s3a://, or http:// URIs for JARs hosted externally.

Theoretical Basis

The pod-based execution model follows a create-request-register-execute lifecycle:

submit(driver_pod)
  -> driver_pod.request(executor_pods, N)
    -> executors.register(driver)
      -> execute_tasks
        -> cleanup(executor_pods)

Key constraints of this model:

The port must always be specified in the master URL, even if it is the standard HTTPS port 443.
The application name must be lowercase alphanumeric (with - and . allowed) because it is used to name Kubernetes resources, which have strict naming requirements.
In client mode, the driver must be network-routable from executor pods, which may require a headless Kubernetes service.

Deployment Mode Comparison

Aspect	Cluster Mode	Client Mode
Driver location	Kubernetes pod	Local machine or external pod
Executor location	Kubernetes pods	Kubernetes pods
Recommended for	Production	Debugging, interactive use
Driver lifecycle	Managed by Kubernetes	Managed by user
Network requirement	API server reachable from client	Driver reachable from executor pods
Spark UI access	Via port-forward or ingress	Direct on local machine

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment