Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:DataExpert io Data engineer handbook Do team vertex transformation

From Leeroopedia


Overview

This page documents the do_team_vertex_transformation function, which converts relational NBA team data into a graph vertex format. The function deduplicates teams and produces vertices with an identifier, type label, and property map.

Type

API Doc

Source

team_vertex_job.py:L1-36 (full file)

Signature

def do_team_vertex_transformation(spark, dataframe) -> DataFrame

Import

from src.jobs.team_vertex_job import do_team_vertex_transformation

Inputs / Outputs

Direction Name Type Description
Input spark SparkSession An active SparkSession instance for executing SQL
Input dataframe DataFrame A DataFrame containing columns: team_id, abbreviation, nickname, city, arena, yearfounded
Output result DataFrame A DataFrame containing columns: identifier, type, properties (map)

SQL Query Structure

The function registers the input DataFrame as a temporary view and executes a SQL query with one CTE:

  • teams_deduped - Uses ROW_NUMBER() OVER (PARTITION BY team_id ORDER BY ...) to deduplicate team records, keeping only the first occurrence of each team_id.

The final SELECT produces the vertex format:

WITH teams_deduped AS (
    SELECT *,
        ROW_NUMBER() OVER (PARTITION BY team_id ORDER BY team_id) AS row_num
    FROM teams_raw
)
SELECT
    team_id AS identifier,
    'team' AS type,
    MAP(
        'abbreviation', abbreviation,
        'nickname', nickname,
        'city', city,
        'arena', arena,
        'yearfounded', CAST(yearfounded AS VARCHAR)
    ) AS properties
FROM teams_deduped
WHERE row_num = 1

Usage Example

spark = SparkSession.builder.master("local").appName("team_vertex").getOrCreate()
input_df = spark.read.table("teams_raw")
output_df = do_team_vertex_transformation(spark, input_df)
output_df.write.mode("overwrite").insertInto("team_vertices")

Related Pages

Knowledge Sources

Metadata

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment