Principle:Apache Spark K8s Resource Configuration
| Metadata | Value |
|---|---|
| Domains | Kubernetes, Configuration |
| Type | Principle |
| Related | Implementation:Apache_Spark_K8s_Config_Properties |
Overview
A Kubernetes-native configuration pattern that maps Spark application requirements to Kubernetes resources through spark.kubernetes.* properties, pod templates, and RBAC resources.
Description
Running Spark on Kubernetes requires translating Spark concepts (driver, executors, memory, cores) into Kubernetes primitives (pods, services, secrets, volumes). The configuration pattern provides multiple layers of customization, each serving a different level of specificity:
spark.kubernetes.*properties -- The primary configuration mechanism for common Kubernetes-specific settings. These properties are passed via--confflags tospark-submitand control the container image, namespace, resource requests, secret mounts, volume mounts, and other Kubernetes-native behaviors.- Pod templates -- YAML files that provide fine-grained control over pod specifications beyond what
spark.kubernetes.*properties expose. Pod templates allow specifying node affinity, tolerations, init containers, sidecar containers, custom volume types, and other advanced Kubernetes features without modifying Spark code. - RBAC resources -- Kubernetes ServiceAccount, Role, and RoleBinding resources that control the security context under which Spark pods operate. The service account used by the driver must have permissions to create and manage executor pods.
These layers are merged at submission time, with Spark-managed values taking precedence over template values for properties that Spark is opinionated about (such as the main container name, image, and resource requests).
Usage
Use when configuring Spark applications for Kubernetes deployment:
- Minimal configuration -- Start with
spark.kubernetes.container.imageas the only required Kubernetes-specific property. - Resource tuning -- Add driver and executor resource requests and limits via
spark.kubernetes.driver.request.cores,spark.kubernetes.executor.request.cores, and memory properties. - Advanced pod customization -- Use
spark.kubernetes.driver.podTemplateFileandspark.kubernetes.executor.podTemplateFilefor requirements like GPU scheduling, node affinity, tolerations, or init containers. - Secret management -- Mount Kubernetes secrets into pods via
spark.kubernetes.driver.secrets.[Name]andspark.kubernetes.executor.secrets.[Name]. - Volume management -- Attach hostPath, emptyDir, NFS, or PersistentVolumeClaim volumes via
spark.kubernetes.driver.volumes.[Type].[Name].mount.path.
Theoretical Basis
The configuration follows a layered merge strategy:
spark_properties(container.image, namespace, resources)
-> pod_templates(driver_template, executor_template)
-> rbac(service_account, roles, role_bindings)
-> merge_and_submit
During the merge:
- Spark properties override corresponding pod template values for Spark-managed fields.
- Pod templates provide additive configuration for fields that Spark does not manage.
- RBAC resources are external to the merge and must be pre-applied to the cluster.
This layering enables progressive complexity: simple deployments use only spark.kubernetes.* properties, while complex deployments layer on pod templates and custom RBAC without changing the core submission workflow.
Configuration Layers
| Layer | Mechanism | Scope | Example |
|---|---|---|---|
| Spark properties | --conf spark.kubernetes.* |
Common K8s settings | spark.kubernetes.container.image=spark:latest
|
| Pod templates | YAML files referenced by properties | Advanced pod specs | Node affinity, tolerations, init containers |
| RBAC | kubectl apply pre-submission |
Cluster security | ServiceAccount, ClusterRole, ClusterRoleBinding |
| Secrets | Property-based mount | Credentials | spark.kubernetes.driver.secrets.my-secret=/etc/secrets
|
| Volumes | Property-based mount | Storage | spark.kubernetes.driver.volumes.hostPath.logs.mount.path=/var/log
|