Principle:Bigscience workshop Petals Block Selection
| Knowledge Sources | |
|---|---|
| Domains | Distributed_Computing, Load_Balancing, Optimization |
| Last Updated | 2026-02-09 14:00 GMT |
Overview
A greedy algorithm for selecting which contiguous range of transformer blocks a server should host, optimizing to fill gaps in the network's block coverage and maximize overall swarm throughput.
Description
Block Selection solves the decentralized load balancing problem in Petals: given a swarm of volunteer servers each hosting different block ranges, which blocks should a new server host to maximize the network's end-to-end throughput?
Algorithm:
- Query DHT: Retrieve RemoteModuleInfo for all blocks, listing which servers host each block and their throughput
- Compute per-block throughput: Sum the throughput of all servers covering each block
- Find the bottleneck: Identify the contiguous range of num_blocks blocks with the lowest aggregate throughput
- Select that range: The new server fills the gap where it is most needed
Rebalancing: Existing servers periodically check if the swarm would benefit from them switching blocks via should_choose_other_blocks(), which compares the actual throughput distribution to the optimal (uniform) distribution using a balance_quality threshold.
Usage
This principle is used automatically when a server starts and during periodic rebalancing checks. Server operators can override it by specifying --block_indices explicitly.
Theoretical Basis
Optimal throughput is bottleneck-limited:
End-to-end throughput of the distributed model is limited by the block with the least total serving capacity:
Greedy allocation:
# Abstract block selection algorithm
def choose_best_blocks(num_blocks, module_infos):
throughputs = compute_per_block_throughputs(module_infos)
# Find contiguous range of num_blocks with minimum total throughput
best_start = argmin(
sum(throughputs[i:i+num_blocks]) for i in range(total_blocks - num_blocks + 1)
)
return list(range(best_start, best_start + num_blocks))
Balance quality metric:
If quality < balance_quality threshold (default 0.75), servers should consider rebalancing.