Implementation:DataExpert io Data engineer handbook Tumble Over Window
Overview
Tumble_Over_Window documents the usage of the PyFlink Table API's Tumble window class to define 5-minute tumbling windows over streaming event data. This wrapper chains window definition, grouping, selection, and sink insertion into a single fluent expression.
Type
Wrapper Doc (PyFlink Table API)
Source
aggregation_job.py:L96-108
Signature
Tumble.over(lit(5).minutes).on(col("window_timestamp")).alias("w")
Detailed Description
The tumbling window is defined and used as part of a fluent Table API chain that performs windowed aggregation.
Window Definition
from pyflink.table.window import Tumble
from pyflink.table.expressions import lit, col
Tumble.over(lit(5).minutes).on(col("window_timestamp")).alias("w")
Tumble.over(lit(5).minutes)-- defines a tumbling window of 5-minute duration..on(col("window_timestamp"))-- specifies the time attribute column used for windowing..alias("w")-- assigns the aliaswto the window for use in subsequentgroup_byandselectclauses.
Full Chain
The complete expression chains window definition with grouping, aggregation, and sink insertion:
source_table.window(
Tumble.over(lit(5).minutes).on(col("window_timestamp")).alias("w")
).group_by(
col("w"), col("host")
).select(
col("w").start.alias("event_hour"),
col("host"),
col("host").count.alias("num_hits")
).execute_insert(sink_table)
Breakdown:
.window(...)-- attaches the tumbling window definition to the source table..group_by(col("w"), col("host"))-- groups records by window and host dimension..select(...)-- projects the window start time asevent_hour, thehost, and the count of records asnum_hits..execute_insert(sink_table)-- inserts the aggregated results into the specified PostgreSQL sink table.
Imports
from pyflink.table.window import Tumble
from pyflink.table.expressions import lit, col
Inputs / Outputs
Inputs:
- A Flink
Tableobject with awindow_timestampcolumn (or equivalent time attribute column with a watermark defined)
Outputs:
- A windowed aggregation result containing:
event_hour-- the start timestamp of the 5-minute tumbling windowhost-- the grouped host dimensionnum_hits-- the count of events in that window for that host
The result is continuously written to the specified sink table via execute_insert.
Related Pages
- Principle:DataExpert_io_Data_engineer_handbook_Tumbling_Window
- Environment:DataExpert_io_Data_engineer_handbook_Flink_Kafka_Docker_Environment
- Heuristic:DataExpert_io_Data_engineer_handbook_Watermark_Late_Arrival_Tolerance