Principle:Apache Flink Rolling Policy
| Knowledge Sources | |
|---|---|
| Domains | Stream_Processing, File_IO |
| Last Updated | 2026-02-09 00:00 GMT |
Overview
A file rotation mechanism that determines when to close the current output file and start a new one based on size, time, or checkpoint boundaries.
Description
Rolling Policy governs the lifecycle of individual output files within a bucket. Without rolling, a single output file would grow unboundedly, causing problems with file system limits, downstream processing latency, and failure recovery cost. The rolling policy evaluates conditions on three triggers:
- On Checkpoint: Whether to roll when a checkpoint barrier arrives
- On Event: Whether to roll when a new record is written
- On Processing Time: Whether to roll based on elapsed wall-clock time
The default policy rolls files based on three thresholds: maximum part file size (128 MB), maximum open duration (60 seconds), and maximum inactivity interval (60 seconds). For bulk formats that cannot be split mid-write, checkpoint-based rolling is the only option.
Usage
Use this principle to control output file granularity. Smaller files enable faster downstream processing and more granular failure recovery, but increase metadata overhead. Larger files improve compression ratios and reduce small-file problems on distributed file systems like HDFS.
Theoretical Basis
// Abstract algorithm
function shouldRoll(partFileInfo, trigger):
if trigger == CHECKPOINT:
return policy.shouldRollOnCheckpoint(partFileInfo)
if trigger == EVENT:
return partFileInfo.size >= maxPartSize
if trigger == PROCESSING_TIME:
return (currentTime - partFileInfo.creationTime >= rolloverInterval)
OR (currentTime - partFileInfo.lastWriteTime >= inactivityInterval)