Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Principle:Apache Spark K8s Resource Configuration

From Leeroopedia


Metadata Value
Domains Kubernetes, Configuration
Type Principle
Related Implementation:Apache_Spark_K8s_Config_Properties

Overview

A Kubernetes-native configuration pattern that maps Spark application requirements to Kubernetes resources through spark.kubernetes.* properties, pod templates, and RBAC resources.

Description

Running Spark on Kubernetes requires translating Spark concepts (driver, executors, memory, cores) into Kubernetes primitives (pods, services, secrets, volumes). The configuration pattern provides multiple layers of customization, each serving a different level of specificity:

  • spark.kubernetes.* properties -- The primary configuration mechanism for common Kubernetes-specific settings. These properties are passed via --conf flags to spark-submit and control the container image, namespace, resource requests, secret mounts, volume mounts, and other Kubernetes-native behaviors.
  • Pod templates -- YAML files that provide fine-grained control over pod specifications beyond what spark.kubernetes.* properties expose. Pod templates allow specifying node affinity, tolerations, init containers, sidecar containers, custom volume types, and other advanced Kubernetes features without modifying Spark code.
  • RBAC resources -- Kubernetes ServiceAccount, Role, and RoleBinding resources that control the security context under which Spark pods operate. The service account used by the driver must have permissions to create and manage executor pods.

These layers are merged at submission time, with Spark-managed values taking precedence over template values for properties that Spark is opinionated about (such as the main container name, image, and resource requests).

Usage

Use when configuring Spark applications for Kubernetes deployment:

  • Minimal configuration -- Start with spark.kubernetes.container.image as the only required Kubernetes-specific property.
  • Resource tuning -- Add driver and executor resource requests and limits via spark.kubernetes.driver.request.cores, spark.kubernetes.executor.request.cores, and memory properties.
  • Advanced pod customization -- Use spark.kubernetes.driver.podTemplateFile and spark.kubernetes.executor.podTemplateFile for requirements like GPU scheduling, node affinity, tolerations, or init containers.
  • Secret management -- Mount Kubernetes secrets into pods via spark.kubernetes.driver.secrets.[Name] and spark.kubernetes.executor.secrets.[Name].
  • Volume management -- Attach hostPath, emptyDir, NFS, or PersistentVolumeClaim volumes via spark.kubernetes.driver.volumes.[Type].[Name].mount.path.

Theoretical Basis

The configuration follows a layered merge strategy:

spark_properties(container.image, namespace, resources)
  -> pod_templates(driver_template, executor_template)
    -> rbac(service_account, roles, role_bindings)
      -> merge_and_submit

During the merge:

  • Spark properties override corresponding pod template values for Spark-managed fields.
  • Pod templates provide additive configuration for fields that Spark does not manage.
  • RBAC resources are external to the merge and must be pre-applied to the cluster.

This layering enables progressive complexity: simple deployments use only spark.kubernetes.* properties, while complex deployments layer on pod templates and custom RBAC without changing the core submission workflow.

Configuration Layers

Layer Mechanism Scope Example
Spark properties --conf spark.kubernetes.* Common K8s settings spark.kubernetes.container.image=spark:latest
Pod templates YAML files referenced by properties Advanced pod specs Node affinity, tolerations, init containers
RBAC kubectl apply pre-submission Cluster security ServiceAccount, ClusterRole, ClusterRoleBinding
Secrets Property-based mount Credentials spark.kubernetes.driver.secrets.my-secret=/etc/secrets
Volumes Property-based mount Storage spark.kubernetes.driver.volumes.hostPath.logs.mount.path=/var/log

Related

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment