Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Heuristic:Apache Shardingsphere Worker ID Reservation Strategy

From Leeroopedia



Knowledge Sources
Domains Cluster_Coordination, Distributed_ID_Generation
Last Updated 2026-02-10 02:00 GMT

Overview

Distributed worker ID allocation strategy using a bounded range (0-1023), PriorityQueue selection, and exclusive ephemeral node reservation with retry loop.

Description

ShardingSphere cluster mode assigns each compute node a unique worker ID from the range 0-1023 (1024 total). The allocation uses a three-phase strategy: (1) enumerate all unassigned IDs into a PriorityQueue, (2) attempt to reserve the lowest available ID using an exclusive ephemeral node in the cluster repository, (3) if reservation fails (another node claimed it concurrently), retry the entire process. This approach avoids centralized counters and handles concurrent node starts gracefully.

Usage

Apply this heuristic when:

  • Troubleshooting: A compute node fails to start with `WorkerIdAssignedException` — all 1024 IDs are in use. Stale ephemeral nodes may need to expire (ZooKeeper session timeout or etcd lease expiry).
  • Capacity planning: A single ShardingSphere cluster supports a maximum of 1024 compute nodes.
  • Understanding startup latency: Under high concurrency (many nodes starting simultaneously), the retry loop may cause startup delays.

The Insight (Rule of Thumb)

  • Action: Worker IDs are allocated from a bounded pool (0-1023) using optimistic reservation with retry.
  • Value: Maximum 1024 concurrent compute nodes per cluster namespace.
  • Trade-off: Lowest-available-first selection (PriorityQueue) provides deterministic ordering but increases contention on low IDs during concurrent startup. The retry loop with do-while handles contention transparently.
  • Failure mode: If all IDs are exhausted, the generator throws `WorkerIdAssignedException`. Recovery requires deregistering stale nodes or waiting for ephemeral node session expiry.

Reasoning

The worker ID range 0-1023 matches the Snowflake algorithm's 10-bit worker ID field, which is the standard distributed ID generation scheme used by ShardingSphere. The PriorityQueue naturally selects the smallest available ID, providing deterministic behavior.

The exclusive ephemeral node pattern ensures:

  1. Mutual exclusion: Only one node can claim a specific ID.
  2. Automatic cleanup: When a node disconnects, its ephemeral node expires and the ID becomes available.
  3. No coordinator: No leader election or centralized allocation service is needed.

The `ClusterRepositoryPersistException` is silently caught during reservation, returning `Optional.empty()` to trigger the retry loop. This is intentional: the exception indicates another node claimed the ID concurrently, which is a normal race condition, not an error.

Code evidence from `ClusterWorkerIdGenerator.java:63-81`:

private int generateNewWorkerId() {
    Optional<Integer> generatedWorkId;
    do {
        generatedWorkId = generateAvailableWorkerId();
    } while (!generatedWorkId.isPresent());
    int result = generatedWorkId.get();
    computeNodePersistService.persistWorkerId(instanceId, result);
    return result;
}

private Optional<Integer> generateAvailableWorkerId() {
    Collection<Integer> assignedWorkerIds = computeNodePersistService.getAssignedWorkerIds();
    ShardingSpherePreconditions.checkState(
        assignedWorkerIds.size() <= MAX_WORKER_ID + 1, WorkerIdAssignedException::new);
    PriorityQueue<Integer> availableWorkerIds = IntStream.range(0, MAX_WORKER_ID + 1)
        .boxed().filter(each -> !assignedWorkerIds.contains(each))
        .collect(Collectors.toCollection(PriorityQueue::new));
    Integer preselectedWorkerId = availableWorkerIds.poll();
    Preconditions.checkNotNull(preselectedWorkerId);
    return reservationPersistService.reserveWorkerId(preselectedWorkerId, instanceId);
}

Reservation with silent exception handling from `ReservationPersistService.java:43-50`:

public Optional<Integer> reserveWorkerId(final Integer preselectedWorkerId, final String instanceId) {
    try {
        return repository.persistExclusiveEphemeral(
            NodePathGenerator.toPath(new WorkerIDReservationNodePath(preselectedWorkerId)),
            instanceId) ? Optional.of(preselectedWorkerId) : Optional.empty();
    } catch (final ClusterRepositoryPersistException ignore) {
        return Optional.empty();
    }
}

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment