Principle:Apache Spark Cluster Configuration

Metadata

Field	Value
Domains	Configuration, Deployment

Overview

A configuration pattern that specifies the target cluster manager and execution mode for distributed application submission through a set of standardized parameters.

Description

Spark applications can run on multiple cluster managers (Standalone, YARN, Kubernetes, Mesos). The cluster configuration pattern abstracts deployment details behind a uniform --master URL and --deploy-mode flag.

The master URL determines which cluster manager handles resource allocation
The deploy-mode determines whether the driver runs on the submission machine (client) or on the cluster (cluster mode)

This abstraction provides several benefits:

Portability — the same application code runs on any supported cluster manager without modification
Separation of concerns — application logic is decoupled from deployment topology
Uniform interface — a single submission command works across all cluster managers
Flexible scaling — switching from local development to a production cluster requires only changing the master URL

Client Mode vs. Cluster Mode

Aspect	Client Mode	Cluster Mode
Driver location	Submission machine	Cluster worker node
Console output	Visible locally	Redirected to cluster logs
Network dependency	Must stay connected	Can disconnect after submission
Use case	Interactive development, debugging	Production jobs, automated pipelines

Usage

Use this to configure where and how your Spark application executes. Client mode is preferred for interactive work; cluster mode for production jobs where the submission machine may disconnect.

Theoretical Basis

The master URL scheme acts as a service locator pattern. The URL format determines which cluster manager implementation is instantiated:

Master URL	Cluster Manager
local[N]	Local mode (N threads)
spark://host:port	Standalone cluster manager
yarn	Apache YARN
k8s://host:port	Kubernetes

This is analogous to JDBC connection strings where the URL scheme determines the database driver. The Spark submission layer parses the master URL and delegates to the appropriate cluster manager backend.

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment