Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:Heibaiying BigData Notes SparkSession Builder

From Leeroopedia


Knowledge Sources
Domains Data_Analysis, Big_Data
Last Updated 2026-02-10 10:00 GMT

Overview

Concrete tool for creating the SparkSession entry point provided by Apache Spark.

Description

The SparkSession.builder() API provides a fluent builder pattern to construct a SparkSession instance. It is the standard mechanism for initializing Spark SQL in both application code and interactive shells. The builder accepts an application name, a master URL, and optional configuration key-value pairs, then either creates a new session or returns an existing one via getOrCreate().

In the BigData-Notes repository, SparkSession creation is demonstrated at the beginning of the Structured API usage guide, where a local-mode session is constructed before performing DataFrame operations.

Usage

Use SparkSession.builder() at the start of every Spark SQL program. Call appName() to set a human-readable identifier, master() to specify the execution environment, and getOrCreate() to obtain the session. After creation, import spark.implicits._ to enable implicit conversions (e.g., converting Scala collections to DataFrames).

Code Reference

Source Location

  • Repository file: notes/Spark_Structured_API的基本使用.md (approximately lines 30-35)
  • External class: org.apache.spark.sql.SparkSession
  • External documentation: SparkSession Scaladoc

Signature

SparkSession.builder()
  .appName(name: String)
  .master(master: String)
  .config(key: String, value: String)   // optional, repeatable
  .enableHiveSupport()                  // optional
  .getOrCreate(): SparkSession

Import

import org.apache.spark.sql.SparkSession

I/O Contract

Inputs

Name Type Required Description
appName String Yes Human-readable application name displayed in the Spark UI
master String Yes (in client code) Cluster manager URL (e.g., "local[*]", "yarn", "spark://host:7077")
config key-value (String, String) No Arbitrary Spark configuration properties (e.g., "spark.sql.shuffle.partitions")
enableHiveSupport Boolean flag No Enables Hive metastore integration for persistent tables and Hive UDFs

Outputs

Name Type Description
spark SparkSession The unified entry point for DataFrame/Dataset creation, SQL execution, and catalog access

Usage Examples

import org.apache.spark.sql.SparkSession

// Create a local SparkSession for data analysis
val spark = SparkSession.builder()
  .appName("Spark-SQL-Analysis")
  .master("local[*]")
  .getOrCreate()

// Enable implicit conversions for $ column syntax and toDF()
import spark.implicits._

// Verify the session is active
println(s"Spark version: ${spark.version}")
println(s"App name: ${spark.sparkContext.appName}")

// Access the underlying SparkContext if needed
val sc = spark.sparkContext

// Stop the session when finished
spark.stop()

Related Pages

Implements Principle

Requires Environment

Uses Heuristic

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment