Principle:DataExpert io Data engineer handbook Tumbling Window
Overview
Tumbling Window describes the theory of tumbling windows in stream processing. Tumbling windows are the simplest and most commonly used window type, dividing a continuous stream into fixed-size, non-overlapping time intervals where each event belongs to exactly one window.
Fixed-Size, Non-Overlapping Time Windows
A tumbling window is defined by a single parameter: its size (also called duration or length). The key properties are:
- Fixed size -- every window spans the same duration (e.g., 5 minutes, 1 hour).
- Non-overlapping -- windows are contiguous but do not overlap; there are no gaps and no duplicates.
- Each event belongs to exactly one window -- unlike sliding windows, an event is never counted in multiple windows.
Time: |--W1--|--W2--|--W3--|--W4--|
Events: e1 e2 e3 e4 e5 e6
\___/ |_| \________/
W1 W2 W3
Difference from Sliding and Session Windows
| Property | Tumbling Window | Sliding Window | Session Window |
|---|---|---|---|
| Size | Fixed | Fixed | Variable (gap-based) |
| Overlap | None | Windows overlap by slide interval | None |
| Event assignment | Exactly one window | Multiple windows | Exactly one session |
| Parameters | Size only | Size + slide | Gap duration |
| Use case | Periodic aggregates | Moving averages | Activity sessions |
Tumbling windows are a special case of sliding windows where the slide interval equals the window size. This means there is no overlap and every event is processed exactly once per aggregation.
Alignment to Epoch
Tumbling windows in Flink are aligned to the epoch (January 1, 1970, 00:00:00 UTC). This means:
- A 5-minute tumbling window always starts at minutes 00, 05, 10, 15, 20, etc., regardless of when the job starts.
- This alignment ensures that windows are deterministic and reproducible across job restarts.
- Different Flink jobs using the same window size will produce windows with identical boundaries.
For example, with a 5-minute tumbling window:
Window 1: [10:00:00, 10:05:00)
Window 2: [10:05:00, 10:10:00)
Window 3: [10:10:00, 10:15:00)
Tumbling Windows in Flink
In the PyFlink Table API, a tumbling window is expressed using the Tumble class:
from pyflink.table.window import Tumble
from pyflink.table.expressions import lit, col
table.window(
Tumble.over(lit(5).minutes).on(col("event_timestamp")).alias("w")
)
In Flink SQL, the equivalent is:
SELECT
TUMBLE_START(event_timestamp, INTERVAL '5' MINUTE) AS window_start,
host,
COUNT(*) AS num_hits
FROM events
GROUP BY
TUMBLE(event_timestamp, INTERVAL '5' MINUTE),
host;
When to Use
Use Tumbling Window when:
- Computing periodic aggregates without overlap (e.g., events per 5-minute bucket).
- Each event should be counted exactly once in the aggregation.
- Deterministic, epoch-aligned window boundaries are desired.
- The use case does not require overlapping windows or activity-based session semantics.