Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Principle:Apache Spark Cluster Configuration

From Leeroopedia


Metadata

Field Value
Domains Configuration, Deployment

Overview

A configuration pattern that specifies the target cluster manager and execution mode for distributed application submission through a set of standardized parameters.

Description

Spark applications can run on multiple cluster managers (Standalone, YARN, Kubernetes, Mesos). The cluster configuration pattern abstracts deployment details behind a uniform --master URL and --deploy-mode flag.

  • The master URL determines which cluster manager handles resource allocation
  • The deploy-mode determines whether the driver runs on the submission machine (client) or on the cluster (cluster mode)

This abstraction provides several benefits:

  • Portability — the same application code runs on any supported cluster manager without modification
  • Separation of concerns — application logic is decoupled from deployment topology
  • Uniform interface — a single submission command works across all cluster managers
  • Flexible scaling — switching from local development to a production cluster requires only changing the master URL

Client Mode vs. Cluster Mode

Aspect Client Mode Cluster Mode
Driver location Submission machine Cluster worker node
Console output Visible locally Redirected to cluster logs
Network dependency Must stay connected Can disconnect after submission
Use case Interactive development, debugging Production jobs, automated pipelines

Usage

Use this to configure where and how your Spark application executes. Client mode is preferred for interactive work; cluster mode for production jobs where the submission machine may disconnect.

Theoretical Basis

The master URL scheme acts as a service locator pattern. The URL format determines which cluster manager implementation is instantiated:

Master URL Cluster Manager
local[N] Local mode (N threads)
spark://host:port Standalone cluster manager
yarn Apache YARN
k8s://host:port Kubernetes

This is analogous to JDBC connection strings where the URL scheme determines the database driver. The Spark submission layer parses the master URL and delegates to the appropriate cluster manager backend.

Related

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment