Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Heuristic:Apache Hudi Write Memory Tuning

From Leeroopedia




Knowledge Sources
Domains Optimization, Stream_Processing
Last Updated 2026-02-08 20:00 GMT

Overview

Memory budget formula for Flink write tasks: write.task.max.size must exceed 100 MB (merge reader) + write.merge.max_memory (default 100 MB), leaving the remainder for write buffers.

Description

Each Flink Hudi write task allocates memory for three distinct purposes: a fixed 100 MB merge reader, a configurable merge map (write.merge.max_memory, default 100 MB), and a write buffer pool (the remainder). If the total write.task.max.size does not exceed the sum of the first two, the job fails at startup with a validation error. The default 1 GB allocation leaves approximately 824 MB for write buffering. Understanding this formula prevents OOM errors and allows predictable memory planning.

Usage

Apply this heuristic when configuring Flink Hudi write jobs to avoid OOM errors or startup failures. It is essential when reducing memory footprint for small clusters or when increasing buffer sizes for high-throughput workloads.

The Insight (Rule of Thumb)

  • Formula: write.task.max.size > 100 (fixed reader) + write.merge.max_memory
  • Default budget: 1024 MB total = 100 MB reader + 100 MB merge + 824 MB buffer
  • Minimum safe value: write.task.max.size >= 201 MB (with default merge memory of 100 MB)
  • For high throughput: Increase write.task.max.size to 2048+ MB and use write.buffer.type=DISRUPTOR with write.buffer.disruptor.ring.size=16384 (must be power of 2)
  • Trade-off: Larger buffer sizes consume more JVM heap per task but reduce flush frequency

Reasoning

The write path in Hudi Flink requires concurrent memory for reading existing data (merge reader), maintaining a merge map for conflict resolution, and buffering incoming records. The merge reader is fixed at 100 MB because it reads Parquet/log files in batch. The merge map must hold the entire key set for deduplication. The remaining budget determines how many records can be buffered before a forced flush, which directly impacts write throughput and file size quality.

The DISRUPTOR buffer type provides lock-free ring buffer semantics for asynchronous writes, offering 10-20% throughput improvement over the default unbuffered approach. The ring size must be a power of 2 per the LMAX Disruptor library constraint.

Code Evidence

Memory budget validation from MemorySegmentPoolFactory.java:47-52:

long mergeReaderMem = 100; // constant 100MB
long mergeMapMaxMem = conf.get(FlinkOptions.WRITE_MERGE_MAX_MEMORY);
long maxBufferSize = (long) ((conf.get(FlinkOptions.WRITE_TASK_MAX_SIZE)
    - mergeReaderMem - mergeMapMaxMem) * 1024 * 1024);
final String errMsg = String.format(
    "'%s' should be at least greater than '%s' plus merge reader memory(constant 100MB now)",
    FlinkOptions.WRITE_TASK_MAX_SIZE.key(), FlinkOptions.WRITE_MERGE_MAX_MEMORY.key());
ValidationUtils.checkState(maxBufferSize > 0, errMsg);

Default configuration from FlinkOptions.java:703-708:

public static final ConfigOption<Double> WRITE_TASK_MAX_SIZE = ConfigOptions
    .key("write.task.max.size")
    .doubleType()
    .defaultValue(1024D) // 1GB
    .withDescription("Maximum memory in MB for a write task, when the threshold hits,\n"
        + "it flushes the max size data bucket to avoid OOM, default 1GB");

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment