Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:DataExpert io Data engineer handbook Do player scd transformation

From Leeroopedia


Overview

This page documents the do_player_scd_transformation function, which implements a Slowly Changing Dimension (SCD) Type 2 transformation on player scoring data. The function detects changes in a player's scoring class across seasons and produces date-ranged records for each contiguous period.

Type

API Doc

Source

players_scd_job.py:L1-54 (full file)

Signature

def do_player_scd_transformation(spark, dataframe) -> DataFrame

Import

from src.jobs.players_scd_job import do_player_scd_transformation

Inputs / Outputs

Direction Name Type Description
Input spark SparkSession An active SparkSession instance for executing SQL
Input dataframe DataFrame A DataFrame containing columns: player_name, current_season, scoring_class
Output result DataFrame A DataFrame containing columns: player_name, scoring_class, start_date, end_date

SQL Query Structure

The function registers the input DataFrame as a temporary view and executes a SQL query with three CTEs:

  • streak_started - Uses LAG(scoring_class) OVER (PARTITION BY player_name ORDER BY current_season) to detect when a player's scoring class changes. Produces a change_flag column (1 if changed, 0 otherwise).
  • streak_identified - Uses SUM(change_flag) OVER (PARTITION BY player_name ORDER BY current_season) to assign a unique streak_id to each contiguous group of unchanged scoring class values.
  • aggregated - Uses MIN(current_season) and MAX(current_season) grouped by player_name, scoring_class, and streak_id to produce the final start and end dates for each SCD record.
# Usage example
spark = SparkSession.builder.master("local").appName("players_scd").getOrCreate()
input_df = spark.read.table("players_raw")
output_df = do_player_scd_transformation(spark, input_df)
output_df.write.mode("overwrite").insertInto("players_scd")

Related Pages

Knowledge Sources

Metadata

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment