Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Principle:Apache Flink Rolling Policy

From Leeroopedia
Revision as of 17:51, 16 February 2026 by Admin (talk | contribs) (Auto-imported from principles/Apache_Flink_Rolling_Policy.md)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)


Knowledge Sources
Domains Stream_Processing, File_IO
Last Updated 2026-02-09 00:00 GMT

Overview

A file rotation mechanism that determines when to close the current output file and start a new one based on size, time, or checkpoint boundaries.

Description

Rolling Policy governs the lifecycle of individual output files within a bucket. Without rolling, a single output file would grow unboundedly, causing problems with file system limits, downstream processing latency, and failure recovery cost. The rolling policy evaluates conditions on three triggers:

  • On Checkpoint: Whether to roll when a checkpoint barrier arrives
  • On Event: Whether to roll when a new record is written
  • On Processing Time: Whether to roll based on elapsed wall-clock time

The default policy rolls files based on three thresholds: maximum part file size (128 MB), maximum open duration (60 seconds), and maximum inactivity interval (60 seconds). For bulk formats that cannot be split mid-write, checkpoint-based rolling is the only option.

Usage

Use this principle to control output file granularity. Smaller files enable faster downstream processing and more granular failure recovery, but increase metadata overhead. Larger files improve compression ratios and reduce small-file problems on distributed file systems like HDFS.

Theoretical Basis

// Abstract algorithm
function shouldRoll(partFileInfo, trigger):
    if trigger == CHECKPOINT:
        return policy.shouldRollOnCheckpoint(partFileInfo)
    if trigger == EVENT:
        return partFileInfo.size >= maxPartSize
    if trigger == PROCESSING_TIME:
        return (currentTime - partFileInfo.creationTime >= rolloverInterval)
            OR (currentTime - partFileInfo.lastWriteTime >= inactivityInterval)

Related Pages

Implemented By

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment