Implementation:Heibaiying BigData Notes SparkSession Builder
| Knowledge Sources | |
|---|---|
| Domains | Data_Analysis, Big_Data |
| Last Updated | 2026-02-10 10:00 GMT |
Overview
Concrete tool for creating the SparkSession entry point provided by Apache Spark.
Description
The SparkSession.builder() API provides a fluent builder pattern to construct a SparkSession instance. It is the standard mechanism for initializing Spark SQL in both application code and interactive shells. The builder accepts an application name, a master URL, and optional configuration key-value pairs, then either creates a new session or returns an existing one via getOrCreate().
In the BigData-Notes repository, SparkSession creation is demonstrated at the beginning of the Structured API usage guide, where a local-mode session is constructed before performing DataFrame operations.
Usage
Use SparkSession.builder() at the start of every Spark SQL program. Call appName() to set a human-readable identifier, master() to specify the execution environment, and getOrCreate() to obtain the session. After creation, import spark.implicits._ to enable implicit conversions (e.g., converting Scala collections to DataFrames).
Code Reference
Source Location
- Repository file:
notes/Spark_Structured_API的基本使用.md(approximately lines 30-35) - External class:
org.apache.spark.sql.SparkSession - External documentation: SparkSession Scaladoc
Signature
SparkSession.builder()
.appName(name: String)
.master(master: String)
.config(key: String, value: String) // optional, repeatable
.enableHiveSupport() // optional
.getOrCreate(): SparkSession
Import
import org.apache.spark.sql.SparkSession
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| appName | String | Yes | Human-readable application name displayed in the Spark UI |
| master | String | Yes (in client code) | Cluster manager URL (e.g., "local[*]", "yarn", "spark://host:7077") |
| config key-value | (String, String) | No | Arbitrary Spark configuration properties (e.g., "spark.sql.shuffle.partitions") |
| enableHiveSupport | Boolean flag | No | Enables Hive metastore integration for persistent tables and Hive UDFs |
Outputs
| Name | Type | Description |
|---|---|---|
| spark | SparkSession | The unified entry point for DataFrame/Dataset creation, SQL execution, and catalog access |
Usage Examples
import org.apache.spark.sql.SparkSession
// Create a local SparkSession for data analysis
val spark = SparkSession.builder()
.appName("Spark-SQL-Analysis")
.master("local[*]")
.getOrCreate()
// Enable implicit conversions for $ column syntax and toDF()
import spark.implicits._
// Verify the session is active
println(s"Spark version: ${spark.version}")
println(s"App name: ${spark.sparkContext.appName}")
// Access the underlying SparkContext if needed
val sc = spark.sparkContext
// Stop the session when finished
spark.stop()
Related Pages
Implements Principle
Requires Environment
- Environment:Heibaiying_BigData_Notes_Java_8_Maven_Environment
- Environment:Heibaiying_BigData_Notes_Spark_2_4_Environment