Heuristic:Apache Shardingsphere Worker ID Reservation Strategy
| Knowledge Sources | |
|---|---|
| Domains | Cluster_Coordination, Distributed_ID_Generation |
| Last Updated | 2026-02-10 02:00 GMT |
Overview
Distributed worker ID allocation strategy using a bounded range (0-1023), PriorityQueue selection, and exclusive ephemeral node reservation with retry loop.
Description
ShardingSphere cluster mode assigns each compute node a unique worker ID from the range 0-1023 (1024 total). The allocation uses a three-phase strategy: (1) enumerate all unassigned IDs into a PriorityQueue, (2) attempt to reserve the lowest available ID using an exclusive ephemeral node in the cluster repository, (3) if reservation fails (another node claimed it concurrently), retry the entire process. This approach avoids centralized counters and handles concurrent node starts gracefully.
Usage
Apply this heuristic when:
- Troubleshooting: A compute node fails to start with `WorkerIdAssignedException` — all 1024 IDs are in use. Stale ephemeral nodes may need to expire (ZooKeeper session timeout or etcd lease expiry).
- Capacity planning: A single ShardingSphere cluster supports a maximum of 1024 compute nodes.
- Understanding startup latency: Under high concurrency (many nodes starting simultaneously), the retry loop may cause startup delays.
The Insight (Rule of Thumb)
- Action: Worker IDs are allocated from a bounded pool (0-1023) using optimistic reservation with retry.
- Value: Maximum 1024 concurrent compute nodes per cluster namespace.
- Trade-off: Lowest-available-first selection (PriorityQueue) provides deterministic ordering but increases contention on low IDs during concurrent startup. The retry loop with do-while handles contention transparently.
- Failure mode: If all IDs are exhausted, the generator throws `WorkerIdAssignedException`. Recovery requires deregistering stale nodes or waiting for ephemeral node session expiry.
Reasoning
The worker ID range 0-1023 matches the Snowflake algorithm's 10-bit worker ID field, which is the standard distributed ID generation scheme used by ShardingSphere. The PriorityQueue naturally selects the smallest available ID, providing deterministic behavior.
The exclusive ephemeral node pattern ensures:
- Mutual exclusion: Only one node can claim a specific ID.
- Automatic cleanup: When a node disconnects, its ephemeral node expires and the ID becomes available.
- No coordinator: No leader election or centralized allocation service is needed.
The `ClusterRepositoryPersistException` is silently caught during reservation, returning `Optional.empty()` to trigger the retry loop. This is intentional: the exception indicates another node claimed the ID concurrently, which is a normal race condition, not an error.
Code evidence from `ClusterWorkerIdGenerator.java:63-81`:
private int generateNewWorkerId() {
Optional<Integer> generatedWorkId;
do {
generatedWorkId = generateAvailableWorkerId();
} while (!generatedWorkId.isPresent());
int result = generatedWorkId.get();
computeNodePersistService.persistWorkerId(instanceId, result);
return result;
}
private Optional<Integer> generateAvailableWorkerId() {
Collection<Integer> assignedWorkerIds = computeNodePersistService.getAssignedWorkerIds();
ShardingSpherePreconditions.checkState(
assignedWorkerIds.size() <= MAX_WORKER_ID + 1, WorkerIdAssignedException::new);
PriorityQueue<Integer> availableWorkerIds = IntStream.range(0, MAX_WORKER_ID + 1)
.boxed().filter(each -> !assignedWorkerIds.contains(each))
.collect(Collectors.toCollection(PriorityQueue::new));
Integer preselectedWorkerId = availableWorkerIds.poll();
Preconditions.checkNotNull(preselectedWorkerId);
return reservationPersistService.reserveWorkerId(preselectedWorkerId, instanceId);
}
Reservation with silent exception handling from `ReservationPersistService.java:43-50`:
public Optional<Integer> reserveWorkerId(final Integer preselectedWorkerId, final String instanceId) {
try {
return repository.persistExclusiveEphemeral(
NodePathGenerator.toPath(new WorkerIDReservationNodePath(preselectedWorkerId)),
instanceId) ? Optional.of(preselectedWorkerId) : Optional.empty();
} catch (final ClusterRepositoryPersistException ignore) {
return Optional.empty();
}
}