Principle:Apache Spark Application Entry Point
Metadata
| Field | Value |
|---|---|
| Source Type | Doc |
| Source Name | Spark Quick Start |
| Source URL | https://spark.apache.org/docs/latest/quick-start.html |
| Domains | Application_Development, API_Design |
Overview
A unified entry point pattern that provides a single object through which all data processing operations (SQL, DataFrame, Dataset, Streaming) are accessed.
Description
Modern data processing frameworks consolidate multiple programming interfaces behind a single entry point object. SparkSession serves this role in Apache Spark, replacing the older SparkContext, SQLContext, and HiveContext with a unified API. The Builder pattern allows incremental configuration before materialization, and getOrCreate() ensures singleton semantics within a JVM — preventing accidental resource duplication.
By centralizing session management into a single object, applications gain:
- Simplified initialization — one object replaces three legacy contexts
- Configuration consistency — all settings flow through a single builder chain
- Resource safety — singleton semantics prevent duplicate cluster connections
- API unification — SQL, DataFrame, Dataset, and Streaming operations are accessible from the same entry point
Usage
Use this principle when designing the entry point for any Spark application. Every Spark application must create a SparkSession before performing data operations. This applies to:
- Batch processing applications
- Streaming applications (Structured Streaming)
- Interactive analysis sessions (spark-shell, PySpark shell)
- Unit and integration tests
Theoretical Basis
Implements the Builder Pattern (GoF) combined with Singleton semantics.
builder().config(k,v).config(k,v).getOrCreate()
This ensures exactly one active session per JVM with the specified configuration. The Builder pattern provides:
- Incremental construction — configuration parameters are added one at a time
- Immutable result — the built session is thread-safe once created
- Validation deferral — configuration is validated at build time, not at each setter call
The Singleton semantics provided by getOrCreate() guarantee that:
- Multiple calls with the same configuration return the same instance
- Configuration conflicts between existing and requested sessions are resolved predictably
- JVM-wide resource sharing (thread pools, memory managers) is properly coordinated