Principle:Apache Flink Committable Collection Coordination
| Knowledge Sources | |
|---|---|
| Domains | Stream_Processing, Coordination |
| Last Updated | 2026-02-09 00:00 GMT |
Overview
A single-parallelism coordination operator that collects file committables from all writers and groups them into per-bucket compaction requests.
Description
Committable Collection Coordination runs at parallelism 1 to provide a global view of all pending files across all writer subtasks. It receives CommittableMessage<FileSinkCommittable> from the writers and sorts them into per-bucket groups:
- Files to compact: Pending files that have been finalized by the writer
- Files to pass through: In-progress file cleanups and compacted file cleanups
An internal CompactTrigger tracks checkpoint count and accumulated size per bucket. When the trigger fires, the accumulated files are emitted as a CompactorRequest.
Usage
This principle is internal to the compaction pipeline. It is automatically inserted into the sink topology when compaction is enabled via FileSink.RowFormatBuilder.enableCompact().
Theoretical Basis
// Abstract coordination
function processElement(committableMessage):
committable = message.getCommittable()
bucketId = committable.getBucketId()
if committable.hasPendingFile():
trigger.addToCompact(bucketId, committable)
else:
trigger.addToPassthrough(bucketId, committable)
if trigger.shouldFire(bucketId):
request = trigger.createRequest(bucketId)
emit(request)