Implementation:DataExpert io Data engineer handbook Pytest Spark Fixture
Overview
Type: API Doc
This implementation provides a session-scoped pytest fixture that creates and shares a single SparkSession across all test functions in the test suite. It is defined in conftest.py and leverages pytest's automatic fixture discovery mechanism.
Source
File: conftest.py:L1-9
Code
import pytest
from pyspark.sql import SparkSession
@pytest.fixture(scope='session')
def spark():
return SparkSession.builder \
.master("local") \
.appName("chispa") \
.getOrCreate()
Detailed Breakdown
Import Statements
import pytest— provides the@pytest.fixturedecoratorfrom pyspark.sql import SparkSession— the entry point for PySpark DataFrame operations
Fixture Declaration
- The
@pytest.fixture(scope='session')decorator registers the functionspark()as a pytest fixture with session scope, meaning it is instantiated once for the entire test run and reused by every test that requests it.
SparkSession Configuration
.master("local")— runs Spark in local mode using a single thread, appropriate for unit testing.appName("chispa")— sets the application name to "chispa", reflecting the use of the chispa testing library.getOrCreate()— returns an existing SparkSession if one is already active, or creates a new one. This aligns with the singleton pattern inherent to SparkSession.
Import Mechanism
The fixture is imported automatically via the pytest conftest.py mechanism. Any file named conftest.py in the test directory (or its parents) is automatically loaded by pytest. All fixtures defined therein become available to test functions without explicit import statements.
I/O
- Inputs: pytest framework (fixture injection system)
- Outputs: A
SparkSessioninstance, injected into any test function that declaressparkas a parameter
Usage in Tests
Test functions receive the fixture by declaring it as a parameter:
def test_my_transformation(spark):
# spark is now a live SparkSession
input_df = spark.createDataFrame([("Alice", 1)], ["name", "id"])
# ... run transformation and assert results
Related Pages
- Principle:DataExpert_io_Data_engineer_handbook_SparkSession_Test_Fixture
- Environment:DataExpert_io_Data_engineer_handbook_Python_Development_Environment
- Heuristic:DataExpert_io_Data_engineer_handbook_SparkSession_Singleton_Pattern