Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Principle:Apache Spark Standalone Job Submission

From Leeroopedia


Field Value
Domains Deployment, Execution
Type Principle
Related Implementation:Apache_Spark_Spark_Submit_Standalone

Overview

A job submission pattern that routes application execution to a Spark standalone cluster through the spark:// master URL protocol.

Description

Once a standalone cluster is running, applications are submitted to it using the spark:// URL scheme. The submission process involves several stages:

  • Driver connection -- the driver program connects to the master at the spark:// address
  • Resource allocation -- the master allocates resources from registered workers based on application requirements
  • Executor launch -- workers launch executor JVMs that perform the actual computation
  • Task coordination -- the driver distributes tasks to executors and collects results

Two deploy modes are supported:

  • Client mode -- the driver runs on the submission machine, suitable for interactive use and debugging
  • Cluster mode -- the master spawns the driver on a worker node, suitable for production and long-running applications

The standalone scheduler supports FIFO and fair scheduling across multiple concurrent applications.

Usage

Use after starting a standalone master and workers to run Spark applications on the cluster. Common scenarios include:

  • Interactive analysis -- submitting from a user workstation in client mode
  • Production pipelines -- submitting in cluster mode for fault-tolerant execution
  • Batch processing -- submitting multiple applications that share cluster resources

Theoretical Basis

The resource negotiation follows a multi-phase protocol:

submit(app, master_url)
    -> master.allocate(resources)
    -> workers.launch(executors)
    -> driver.coordinate(tasks)
Phase Actor Action
Submit Client Sends application to master via spark:// protocol
Allocate Master Assigns worker resources to the application
Launch Workers Start executor JVMs for computation
Coordinate Driver Distributes tasks and collects results

Related

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment