Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:Apache Spark K8s Config Properties

From Leeroopedia


Metadata Value
Source Doc: Running on K8s
Domains Kubernetes, Configuration
Type Pattern Doc
Related Principle:Apache_Spark_K8s_Resource_Configuration

Overview

Pattern documentation for spark.kubernetes.* configuration properties, pod templates, and RBAC used to configure Spark on Kubernetes.

Description

Spark on Kubernetes is configured through three mechanisms:

  • spark.kubernetes.* properties -- Passed via --conf flags to spark-submit. These control the container image, namespace, resource requests and limits, secret mounts, volume mounts, environment variables, and labels/annotations applied to Spark pods.
  • Pod template YAML files -- Referenced via spark.kubernetes.driver.podTemplateFile and spark.kubernetes.executor.podTemplateFile. These provide complete pod specifications that Spark uses as a starting point, overlaying its own required configuration on top.
  • Kubernetes RBAC resources -- ServiceAccount, Role/ClusterRole, and RoleBinding/ClusterRoleBinding resources that must be applied to the cluster before Spark submission. The spark-rbac.yaml reference file provides a complete example.

Key properties include:

  • spark.kubernetes.container.image -- Required. The Docker image to use for driver and executor containers.
  • spark.kubernetes.namespace -- The Kubernetes namespace to run in (default: current context namespace).
  • spark.kubernetes.driver.podTemplateFile -- Path to a driver pod template YAML.
  • spark.kubernetes.executor.podTemplateFile -- Path to an executor pod template YAML.
  • spark.kubernetes.driver.secrets.[SecretName] -- Mount path for a named Kubernetes secret in the driver pod.
  • spark.kubernetes.executor.secrets.[SecretName] -- Mount path for a named Kubernetes secret in executor pods.
  • spark.kubernetes.driver.volumes.[Type].[Name].mount.path -- Mount path for a volume in the driver pod.
  • spark.kubernetes.executor.volumes.[Type].[Name].mount.path -- Mount path for a volume in executor pods.

Spark is opinionated about certain pod template values and will always override them. Users should consult the pod template properties documentation to understand which values Spark manages.

Usage

Set required properties (at minimum spark.kubernetes.container.image) before submission. Use pod templates for advanced requirements such as node affinity, GPU scheduling, tolerations, or sidecar containers.

Code Reference

Item Reference
Configuration documentation docs/running-on-kubernetes.md (L207-406)
Driver pod template example driver-template.yml
Executor pod template example executor-template.yml
RBAC configuration resource-managers/kubernetes/integration-tests/dev/spark-rbac.yaml

Key Properties

Property Description Required
spark.kubernetes.container.image Docker image for Spark containers Yes
spark.kubernetes.namespace Kubernetes namespace for pods No (uses context default)
spark.kubernetes.driver.podTemplateFile Path to driver pod template YAML No
spark.kubernetes.executor.podTemplateFile Path to executor pod template YAML No
spark.kubernetes.driver.secrets.[Name] Mount path for a secret in driver pod No
spark.kubernetes.executor.secrets.[Name] Mount path for a secret in executor pods No
spark.kubernetes.driver.volumes.[Type].[Name].mount.path Volume mount path for driver No
spark.kubernetes.executor.volumes.[Type].[Name].mount.path Volume mount path for executors No
spark.kubernetes.driver.request.cores CPU request for the driver pod No
spark.kubernetes.executor.request.cores CPU request for executor pods No

Inputs and Outputs

Direction Description
Inputs spark.kubernetes.* properties, pod template YAML files, RBAC YAML resources
Outputs Configured Kubernetes resources (pods, services, secrets, volumes) for Spark submission

Examples

Minimal configuration

--conf spark.kubernetes.container.image=spark:latest

With pod templates

--conf spark.kubernetes.driver.podTemplateFile=driver-template.yml
--conf spark.kubernetes.executor.podTemplateFile=executor-template.yml

With secrets

--conf spark.kubernetes.driver.secrets.my-secret=/etc/secrets
--conf spark.kubernetes.executor.secrets.my-secret=/etc/secrets

With secret environment variables

--conf spark.kubernetes.driver.secretKeyRef.ENV_NAME=name:key
--conf spark.kubernetes.executor.secretKeyRef.ENV_NAME=name:key

With volumes

--conf spark.kubernetes.driver.volumes.hostPath.spark-logs.mount.path=/var/log/spark
--conf spark.kubernetes.driver.volumes.hostPath.spark-logs.options.path=/tmp/spark-logs
--conf spark.kubernetes.executor.volumes.emptyDir.scratch.mount.path=/tmp/scratch

With pod template for node affinity

Example driver-template.yml:

apiVersion: v1
kind: Pod
metadata:
  labels:
    app: spark-driver
spec:
  affinity:
    nodeAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        nodeSelectorTerms:
        - matchExpressions:
          - key: node-type
            operator: In
            values:
            - spark-driver
  tolerations:
  - key: "spark-role"
    operator: "Equal"
    value: "driver"
    effect: "NoSchedule"
  containers:
  - name: spark-driver
    resources:
      requests:
        cpu: "2"
        memory: "4Gi"

Related

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment