Heuristic:Apache Hudi Write Memory Tuning
| Knowledge Sources | |
|---|---|
| Domains | Optimization, Stream_Processing |
| Last Updated | 2026-02-08 20:00 GMT |
Overview
Memory budget formula for Flink write tasks: write.task.max.size must exceed 100 MB (merge reader) + write.merge.max_memory (default 100 MB), leaving the remainder for write buffers.
Description
Each Flink Hudi write task allocates memory for three distinct purposes: a fixed 100 MB merge reader, a configurable merge map (write.merge.max_memory, default 100 MB), and a write buffer pool (the remainder). If the total write.task.max.size does not exceed the sum of the first two, the job fails at startup with a validation error. The default 1 GB allocation leaves approximately 824 MB for write buffering. Understanding this formula prevents OOM errors and allows predictable memory planning.
Usage
Apply this heuristic when configuring Flink Hudi write jobs to avoid OOM errors or startup failures. It is essential when reducing memory footprint for small clusters or when increasing buffer sizes for high-throughput workloads.
The Insight (Rule of Thumb)
- Formula:
write.task.max.size > 100 (fixed reader) + write.merge.max_memory - Default budget: 1024 MB total = 100 MB reader + 100 MB merge + 824 MB buffer
- Minimum safe value:
write.task.max.size>= 201 MB (with default merge memory of 100 MB) - For high throughput: Increase
write.task.max.sizeto 2048+ MB and usewrite.buffer.type=DISRUPTORwithwrite.buffer.disruptor.ring.size=16384(must be power of 2) - Trade-off: Larger buffer sizes consume more JVM heap per task but reduce flush frequency
Reasoning
The write path in Hudi Flink requires concurrent memory for reading existing data (merge reader), maintaining a merge map for conflict resolution, and buffering incoming records. The merge reader is fixed at 100 MB because it reads Parquet/log files in batch. The merge map must hold the entire key set for deduplication. The remaining budget determines how many records can be buffered before a forced flush, which directly impacts write throughput and file size quality.
The DISRUPTOR buffer type provides lock-free ring buffer semantics for asynchronous writes, offering 10-20% throughput improvement over the default unbuffered approach. The ring size must be a power of 2 per the LMAX Disruptor library constraint.
Code Evidence
Memory budget validation from MemorySegmentPoolFactory.java:47-52:
long mergeReaderMem = 100; // constant 100MB
long mergeMapMaxMem = conf.get(FlinkOptions.WRITE_MERGE_MAX_MEMORY);
long maxBufferSize = (long) ((conf.get(FlinkOptions.WRITE_TASK_MAX_SIZE)
- mergeReaderMem - mergeMapMaxMem) * 1024 * 1024);
final String errMsg = String.format(
"'%s' should be at least greater than '%s' plus merge reader memory(constant 100MB now)",
FlinkOptions.WRITE_TASK_MAX_SIZE.key(), FlinkOptions.WRITE_MERGE_MAX_MEMORY.key());
ValidationUtils.checkState(maxBufferSize > 0, errMsg);
Default configuration from FlinkOptions.java:703-708:
public static final ConfigOption<Double> WRITE_TASK_MAX_SIZE = ConfigOptions
.key("write.task.max.size")
.doubleType()
.defaultValue(1024D) // 1GB
.withDescription("Maximum memory in MB for a write task, when the threshold hits,\n"
+ "it flushes the max size data bucket to avoid OOM, default 1GB");