Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:DataExpert io Data engineer handbook Do monthly user site hits transformation

From Leeroopedia


Overview

This page documents the do_monthly_user_site_hits_transformation function, which aggregates array-based hit data into per-day summaries for a given date partition. The function extracts individual elements from hit arrays and sums them across users grouped by month.

Type

API Doc

Source

monthly_user_site_hits_job.py:L7-19

Signature

def do_monthly_user_site_hits_transformation(spark, dataframe, ds) -> DataFrame

Import

from src.jobs.monthly_user_site_hits_job import do_monthly_user_site_hits_transformation

Inputs / Outputs

Direction Name Type Description
Input spark SparkSession An active SparkSession instance for executing SQL
Input dataframe DataFrame A DataFrame containing columns: month_start, hit_array, date_partition
Input ds str The date string used to filter the date_partition column for incremental processing
Output result DataFrame A DataFrame containing columns: month_start, num_hits_first_day, num_hits_second_day, num_hits_third_day

Transformation Logic

The function performs the following steps:

  • Registers the input DataFrame as a temporary view
  • Filters rows where date_partition matches the provided ds parameter
  • Extracts elements from hit_array at positions 0, 1, and 2 using the get() function
  • Wraps each extracted value in COALESCE(..., 0) to handle nulls
  • Applies SUM() aggregation on each extracted element
  • Groups results by month_start
# Usage example
spark = SparkSession.builder.master("local").appName("monthly_hits").getOrCreate()
input_df = spark.read.table("user_site_hits")
output_df = do_monthly_user_site_hits_transformation(spark, input_df, "2024-01-01")
output_df.write.mode("overwrite").insertInto("monthly_user_site_hits")

Related Pages

Knowledge Sources

Metadata

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment