Heuristic:Apache Beam Executor Shutdown Ordering
| Knowledge Sources | |
|---|---|
| Domains | Debugging, Concurrency |
| Last Updated | 2026-02-09 04:00 GMT |
Overview
Graceful shutdown pattern that invalidates caches before shutting down thread pools, collecting all errors independently to prevent masking failures.
Description
The Direct Runner's `ExecutorServiceParallelExecutor` follows a specific shutdown ordering to prevent threads from submitting work to already-shut-down executors. The sequence is: (1) invalidate serial executor caches (stop accepting new work), (2) clean up serial executor caches, (3) shut down the parallel executor, (4) shut down the underlying thread pool, (5) shut down the metrics executor, (6) clean up the transform registry. Each step is wrapped in an independent try-catch block, and all exceptions are collected into a list rather than being thrown immediately. This prevents an early failure from masking later cleanup failures.
Usage
Apply this heuristic when shutting down multi-component execution engines where components have dependencies on each other. The key principle is: stop accepting work before shutting down workers, and never let one cleanup failure prevent others from running.
The Insight (Rule of Thumb)
- Action: Shut down in dependency order: caches → serial executors → parallel executors → thread pools → registries.
- Value: Prevents `RejectedExecutionException` from threads trying to submit to shut-down pools.
- Trade-off: Slightly slower shutdown (each step runs sequentially), but no lost error information.
- Action: Wrap each cleanup step in independent try-catch, collecting all errors.
- Value: All cleanup steps run even if earlier ones fail.
- Trade-off: Error reporting is deferred to end of shutdown sequence.
Reasoning
In a system with caches of executor services, simply calling `executorService.shutdown()` first would cause threads currently executing work to fail when they try to submit results to the now-shut-down serial executor caches. By invalidating the caches first (which marks them as "no new entries"), in-flight work can still complete on existing executors, but no new work is accepted. The cache cleanup then waits for any pending eviction listeners to finish.
The error collection pattern is critical for debugging: if the parallel executor shutdown throws an exception, the thread pool shutdown and registry cleanup must still run. Otherwise, resources leak and the true root cause may be hidden by secondary failures.
Code Evidence
Shutdown ordering from `ExecutorServiceParallelExecutor.java:316-349`:
LOG.debug("Pipeline has terminated. Shutting down.");
final Collection<Exception> errors = new ArrayList<>();
// Stop accepting new work before shutting down the executor.
// This ensures that threads don't try to add work to the
// shutdown executor.
try {
serialExecutorServices.invalidateAll();
} catch (final RuntimeException re) {
errors.add(re);
}
try {
serialExecutorServices.cleanUp();
} catch (final RuntimeException re) {
errors.add(re);
}
try {
parallelExecutorService.shutdown();
} catch (final RuntimeException re) {
errors.add(re);
}
try {
executorService.shutdown();
} catch (final RuntimeException re) {
errors.add(re);
}