Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:DataExpert io Data engineer handbook Tumble Over Window

From Leeroopedia


Overview

Tumble_Over_Window documents the usage of the PyFlink Table API's Tumble window class to define 5-minute tumbling windows over streaming event data. This wrapper chains window definition, grouping, selection, and sink insertion into a single fluent expression.

Type

Wrapper Doc (PyFlink Table API)

Source

aggregation_job.py:L96-108

Signature

Tumble.over(lit(5).minutes).on(col("window_timestamp")).alias("w")

Detailed Description

The tumbling window is defined and used as part of a fluent Table API chain that performs windowed aggregation.

Window Definition

from pyflink.table.window import Tumble
from pyflink.table.expressions import lit, col

Tumble.over(lit(5).minutes).on(col("window_timestamp")).alias("w")
  • Tumble.over(lit(5).minutes) -- defines a tumbling window of 5-minute duration.
  • .on(col("window_timestamp")) -- specifies the time attribute column used for windowing.
  • .alias("w") -- assigns the alias w to the window for use in subsequent group_by and select clauses.

Full Chain

The complete expression chains window definition with grouping, aggregation, and sink insertion:

source_table.window(
    Tumble.over(lit(5).minutes).on(col("window_timestamp")).alias("w")
).group_by(
    col("w"), col("host")
).select(
    col("w").start.alias("event_hour"),
    col("host"),
    col("host").count.alias("num_hits")
).execute_insert(sink_table)

Breakdown:

  1. .window(...) -- attaches the tumbling window definition to the source table.
  2. .group_by(col("w"), col("host")) -- groups records by window and host dimension.
  3. .select(...) -- projects the window start time as event_hour, the host, and the count of records as num_hits.
  4. .execute_insert(sink_table) -- inserts the aggregated results into the specified PostgreSQL sink table.

Imports

from pyflink.table.window import Tumble
from pyflink.table.expressions import lit, col

Inputs / Outputs

Inputs:

  • A Flink Table object with a window_timestamp column (or equivalent time attribute column with a watermark defined)

Outputs:

  • A windowed aggregation result containing:
    • event_hour -- the start timestamp of the 5-minute tumbling window
    • host -- the grouped host dimension
    • num_hits -- the count of events in that window for that host

The result is continuously written to the specified sink table via execute_insert.

Related Pages

Metadata

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment