Implementation:DataExpert io Data engineer handbook SparkSession Builder
Appearance
Overview
This page documents the SparkSession builder usage within the Data Engineer Handbook repository. The SparkSession builder is an external PySpark API used to initialize a Spark session for all PySpark job execution in this project.
Type
Wrapper Doc (external PySpark API used by this repo)
Source
players_scd_job.py:L48-51
Signature
SparkSession.builder.master("local").appName("players_scd").getOrCreate() -> SparkSession
Import
from pyspark.sql import SparkSession
Inputs / Outputs
| Direction | Name | Type | Description |
|---|---|---|---|
| Input | master URL | str | The Spark master URL (e.g., "local" for local mode)
|
| Input | app name | str | The application name displayed in the Spark UI (e.g., "players_scd")
|
| Output | SparkSession | SparkSession | A fully initialized SparkSession instance ready for DataFrame and SQL operations |
Usage Example
from pyspark.sql import SparkSession
spark = (SparkSession.builder
.master("local")
.appName("players_scd")
.getOrCreate())
Related Pages
- Principle:DataExpert_io_Data_engineer_handbook_Spark_Session_Configuration
- Environment:DataExpert_io_Data_engineer_handbook_Spark_Iceberg_Docker_Environment
Knowledge Sources
Metadata
Page Connections
Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment