Principle:Apache Flink Asynchronous File Compaction
| Knowledge Sources | |
|---|---|
| Domains | Stream_Processing, File_IO |
| Last Updated | 2026-02-09 00:00 GMT |
Overview
A parallel asynchronous execution mechanism that performs file compaction off the main operator thread using a configurable thread pool.
Description
Asynchronous File Compaction executes the actual file merge operations outside the Flink operators main processing thread. The CompactService maintains an ExecutorService thread pool (defaulting to the number of CPU cores) and submits compaction tasks as CompletableFuture objects.
Each compaction task:
- Resolves pending file paths from the committables
- Opens a CompactingFileWriter via the BucketWriter
- Executes the FileCompactor strategy (concat or record-wise)
- Produces new FileSinkCommittable entries pointing to the compacted output
The compacted file uses a "compacted-" prefix to distinguish it from regular output files.
Usage
This principle operates internally. The number of compaction threads can be configured via FileCompactStrategy.Builder.setNumCompactThreads().
Theoretical Basis
// Abstract async compaction
function submit(request, resultFuture):
threadPool.submit(() -> {
result = compact(request)
resultFuture.complete(result)
})
function compact(request):
inputPaths = resolveFilePaths(request.filesToCompact)
compactedPath = assembleCompactedPath(inputPaths[0])
writer = bucketWriter.openNewCompactingFile(compactedPath)
fileCompactor.compact(inputPaths, writer)
committable = writer.closeForCommit()
return [committable] + request.passthroughFiles