Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:DataExpert io Data engineer handbook Pytest Spark Fixture

From Leeroopedia


Overview

Type: API Doc

This implementation provides a session-scoped pytest fixture that creates and shares a single SparkSession across all test functions in the test suite. It is defined in conftest.py and leverages pytest's automatic fixture discovery mechanism.

Source

File: conftest.py:L1-9

Code

import pytest
from pyspark.sql import SparkSession

@pytest.fixture(scope='session')
def spark():
    return SparkSession.builder \
      .master("local") \
      .appName("chispa") \
      .getOrCreate()

Detailed Breakdown

Import Statements

  • import pytest — provides the @pytest.fixture decorator
  • from pyspark.sql import SparkSession — the entry point for PySpark DataFrame operations

Fixture Declaration

  • The @pytest.fixture(scope='session') decorator registers the function spark() as a pytest fixture with session scope, meaning it is instantiated once for the entire test run and reused by every test that requests it.

SparkSession Configuration

  • .master("local") — runs Spark in local mode using a single thread, appropriate for unit testing
  • .appName("chispa") — sets the application name to "chispa", reflecting the use of the chispa testing library
  • .getOrCreate() — returns an existing SparkSession if one is already active, or creates a new one. This aligns with the singleton pattern inherent to SparkSession.

Import Mechanism

The fixture is imported automatically via the pytest conftest.py mechanism. Any file named conftest.py in the test directory (or its parents) is automatically loaded by pytest. All fixtures defined therein become available to test functions without explicit import statements.

I/O

  • Inputs: pytest framework (fixture injection system)
  • Outputs: A SparkSession instance, injected into any test function that declares spark as a parameter

Usage in Tests

Test functions receive the fixture by declaring it as a parameter:

def test_my_transformation(spark):
    # spark is now a live SparkSession
    input_df = spark.createDataFrame([("Alice", 1)], ["name", "id"])
    # ... run transformation and assert results

Related Pages

Metadata

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment