Heuristic:Apache Beam GC Thrashing Detection
| Knowledge Sources | |
|---|---|
| Domains | Debugging, Streaming, Memory_Management |
| Last Updated | 2026-02-09 04:00 GMT |
Overview
Memory monitoring strategy that detects GC thrashing via periodic sampling, pre-allocates a 10MB reserve for emergency heap dumps, and auto-kills the JVM after 2 minutes of sustained thrashing.
Description
The Dataflow streaming worker includes a `MemoryMonitor` thread that samples GC activity every 15 seconds over a sliding window of 4 periods (1 minute). If 60% or more of the monitored periods show excessive GC time, the system is declared to be in GC thrashing. To enable heap dump capture during thrashing (when the JVM has almost no free memory), the monitor pre-allocates a 10MB byte array at startup; this buffer is released just before attempting the heap dump, providing enough free memory for the dump to succeed. If thrashing continues for 8 consecutive periods (2 minutes), the JVM is forcibly shut down to prevent cascading failures in the distributed system.
Usage
Apply this heuristic when building long-running JVM services that process unbounded data, especially when memory leaks or bursty workloads can trigger sustained GC pressure. It is critical for streaming workers where a thrashing JVM wastes cluster resources and can cause upstream backpressure cascading.
The Insight (Rule of Thumb)
- Action: Monitor GC every 15 seconds over a 4-period window (1 minute).
- Value: Declare thrashing when >= 60% of periods show high GC time.
- Trade-off: False positives on brief GC spikes are unlikely with the 60% threshold across 4 periods.
- Action: Pre-allocate 10MB reserve buffer at startup.
- Value: Release before heap dump attempt to guarantee enough free space.
- Trade-off: 10MB is permanently unavailable to the application during normal operation.
- Action: Force JVM shutdown after 8 consecutive thrashing periods (2 minutes).
- Value: Prevents cascading failures in distributed systems.
- Trade-off: In-flight work items are lost (Windmill will reassign them).
Reasoning
GC thrashing is a terminal state for streaming workers: the JVM spends most of its time collecting garbage rather than processing data, but never actually reclaims enough memory to make progress. Without the 10MB reserve trick, heap dumps during thrashing almost always fail because there is not enough memory to serialize the heap. The 2-minute kill threshold balances giving the system time to recover from transient spikes against the cost of having a non-productive worker consume cluster resources. Memory state is logged every 5 minutes during normal operation for trend analysis.
The specific constants were tuned for Dataflow production environments:
- 15-second intervals balance monitoring granularity against overhead
- 4-period window (1 minute) smooths out brief GC pauses
- 60% threshold distinguishes sustained thrashing from normal major GC events
- 8 consecutive periods (2 minutes) is the maximum tolerable unproductive time before the Dataflow service benefits from replacing the worker
Code Evidence
Constants from `MemoryMonitor.java:101-133`:
/** Amount of time (in ms) this thread must sleep between two consecutive iterations. */
public static final long DEFAULT_SLEEP_TIME_MILLIS = 15 * 1000; // 15 sec.
/** Number of periods to take into account when determining GC thrashing. */
private static final int NUM_MONITORED_PERIODS = 4; // ie 1 min's worth.
/** Threshold after which the server is considered to be in GC thrashing. */
private static final double GC_THRASHING_PERCENTAGE_PER_SERVER = 60.0;
/** Pre-allocated memory to release before heap dump attempt. */
private static final int HEAP_DUMP_RESERVED_BYTES = 10 << 20; // 10MB
/** Shutdown JVM after this many consecutive thrashing periods. 0 to disable. */
private static final int DEFAULT_SHUT_DOWN_AFTER_NUM_GCTHRASHING = 8; // ie 2 min's worth.
/** Delay between logging the current memory state. */
private static final int NORMAL_LOGGING_PERIOD_MILLIS = 5 * 60 * 1000; // 5 min.