Principle:Apache Spark K8s Resource Configuration

Metadata	Value
Domains	Kubernetes, Configuration
Type	Principle
Related	Implementation:Apache_Spark_K8s_Config_Properties

Overview

A Kubernetes-native configuration pattern that maps Spark application requirements to Kubernetes resources through spark.kubernetes.* properties, pod templates, and RBAC resources.

Description

Running Spark on Kubernetes requires translating Spark concepts (driver, executors, memory, cores) into Kubernetes primitives (pods, services, secrets, volumes). The configuration pattern provides multiple layers of customization, each serving a different level of specificity:

spark.kubernetes.* properties -- The primary configuration mechanism for common Kubernetes-specific settings. These properties are passed via --conf flags to spark-submit and control the container image, namespace, resource requests, secret mounts, volume mounts, and other Kubernetes-native behaviors.
Pod templates -- YAML files that provide fine-grained control over pod specifications beyond what spark.kubernetes.* properties expose. Pod templates allow specifying node affinity, tolerations, init containers, sidecar containers, custom volume types, and other advanced Kubernetes features without modifying Spark code.
RBAC resources -- Kubernetes ServiceAccount, Role, and RoleBinding resources that control the security context under which Spark pods operate. The service account used by the driver must have permissions to create and manage executor pods.

These layers are merged at submission time, with Spark-managed values taking precedence over template values for properties that Spark is opinionated about (such as the main container name, image, and resource requests).

Usage

Use when configuring Spark applications for Kubernetes deployment:

Minimal configuration -- Start with spark.kubernetes.container.image as the only required Kubernetes-specific property.
Resource tuning -- Add driver and executor resource requests and limits via spark.kubernetes.driver.request.cores, spark.kubernetes.executor.request.cores, and memory properties.
Advanced pod customization -- Use spark.kubernetes.driver.podTemplateFile and spark.kubernetes.executor.podTemplateFile for requirements like GPU scheduling, node affinity, tolerations, or init containers.
Secret management -- Mount Kubernetes secrets into pods via spark.kubernetes.driver.secrets.[Name] and spark.kubernetes.executor.secrets.[Name].
Volume management -- Attach hostPath, emptyDir, NFS, or PersistentVolumeClaim volumes via spark.kubernetes.driver.volumes.[Type].[Name].mount.path.

Theoretical Basis

The configuration follows a layered merge strategy:

spark_properties(container.image, namespace, resources)
  -> pod_templates(driver_template, executor_template)
    -> rbac(service_account, roles, role_bindings)
      -> merge_and_submit

During the merge:

Spark properties override corresponding pod template values for Spark-managed fields.
Pod templates provide additive configuration for fields that Spark does not manage.
RBAC resources are external to the merge and must be pre-applied to the cluster.

This layering enables progressive complexity: simple deployments use only spark.kubernetes.* properties, while complex deployments layer on pod templates and custom RBAC without changing the core submission workflow.

Configuration Layers

Layer	Mechanism	Scope	Example
Spark properties	`--conf spark.kubernetes.*`	Common K8s settings	`spark.kubernetes.container.image=spark:latest`
Pod templates	YAML files referenced by properties	Advanced pod specs	Node affinity, tolerations, init containers
RBAC	`kubectl apply` pre-submission	Cluster security	ServiceAccount, ClusterRole, ClusterRoleBinding
Secrets	Property-based mount	Credentials	`spark.kubernetes.driver.secrets.my-secret=/etc/secrets`
Volumes	Property-based mount	Storage	`spark.kubernetes.driver.volumes.hostPath.logs.mount.path=/var/log`

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment