Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Principle:DataExpert io Data engineer handbook Tumbling Window

From Leeroopedia


Overview

Tumbling Window describes the theory of tumbling windows in stream processing. Tumbling windows are the simplest and most commonly used window type, dividing a continuous stream into fixed-size, non-overlapping time intervals where each event belongs to exactly one window.

Fixed-Size, Non-Overlapping Time Windows

A tumbling window is defined by a single parameter: its size (also called duration or length). The key properties are:

  • Fixed size -- every window spans the same duration (e.g., 5 minutes, 1 hour).
  • Non-overlapping -- windows are contiguous but do not overlap; there are no gaps and no duplicates.
  • Each event belongs to exactly one window -- unlike sliding windows, an event is never counted in multiple windows.
Time:     |--W1--|--W2--|--W3--|--W4--|
Events:   e1 e2   e3     e4 e5 e6
          \___/   |_|    \________/
          W1      W2        W3

Difference from Sliding and Session Windows

Property Tumbling Window Sliding Window Session Window
Size Fixed Fixed Variable (gap-based)
Overlap None Windows overlap by slide interval None
Event assignment Exactly one window Multiple windows Exactly one session
Parameters Size only Size + slide Gap duration
Use case Periodic aggregates Moving averages Activity sessions

Tumbling windows are a special case of sliding windows where the slide interval equals the window size. This means there is no overlap and every event is processed exactly once per aggregation.

Alignment to Epoch

Tumbling windows in Flink are aligned to the epoch (January 1, 1970, 00:00:00 UTC). This means:

  • A 5-minute tumbling window always starts at minutes 00, 05, 10, 15, 20, etc., regardless of when the job starts.
  • This alignment ensures that windows are deterministic and reproducible across job restarts.
  • Different Flink jobs using the same window size will produce windows with identical boundaries.

For example, with a 5-minute tumbling window:

Window 1: [10:00:00, 10:05:00)
Window 2: [10:05:00, 10:10:00)
Window 3: [10:10:00, 10:15:00)

Tumbling Windows in Flink

In the PyFlink Table API, a tumbling window is expressed using the Tumble class:

from pyflink.table.window import Tumble
from pyflink.table.expressions import lit, col

table.window(
    Tumble.over(lit(5).minutes).on(col("event_timestamp")).alias("w")
)

In Flink SQL, the equivalent is:

SELECT
    TUMBLE_START(event_timestamp, INTERVAL '5' MINUTE) AS window_start,
    host,
    COUNT(*) AS num_hits
FROM events
GROUP BY
    TUMBLE(event_timestamp, INTERVAL '5' MINUTE),
    host;

When to Use

Use Tumbling Window when:

  • Computing periodic aggregates without overlap (e.g., events per 5-minute bucket).
  • Each event should be counted exactly once in the aggregation.
  • Deterministic, epoch-aligned window boundaries are desired.
  • The use case does not require overlapping windows or activity-based session semantics.

Related Pages

Metadata

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment