Principle:Bigscience workshop Petals Block Selection

Knowledge Sources	Petals Petals: Collaborative Inference and Fine-tuning of Large Models
Domains	Distributed_Computing, Load_Balancing, Optimization
Last Updated	2026-02-09 14:00 GMT

Overview

A greedy algorithm for selecting which contiguous range of transformer blocks a server should host, optimizing to fill gaps in the network's block coverage and maximize overall swarm throughput.

Description

Block Selection solves the decentralized load balancing problem in Petals: given a swarm of volunteer servers each hosting different block ranges, which blocks should a new server host to maximize the network's end-to-end throughput?

Algorithm:

Query DHT: Retrieve RemoteModuleInfo for all blocks, listing which servers host each block and their throughput
Compute per-block throughput: Sum the throughput of all servers covering each block
Find the bottleneck: Identify the contiguous range of num_blocks blocks with the lowest aggregate throughput
Select that range: The new server fills the gap where it is most needed

Rebalancing: Existing servers periodically check if the swarm would benefit from them switching blocks via should_choose_other_blocks(), which compares the actual throughput distribution to the optimal (uniform) distribution using a balance_quality threshold.

Usage

This principle is used automatically when a server starts and during periodic rebalancing checks. Server operators can override it by specifying --block_indices explicitly.

Theoretical Basis

Optimal throughput is bottleneck-limited:

End-to-end throughput of the distributed model is limited by the block with the least total serving capacity:

$T_{e n d - t o - e n d} = \min_{i} \sum_{s \in s e r v e r s (i)} t h r o u g h p u t (s)$

Greedy allocation:

# Abstract block selection algorithm
def choose_best_blocks(num_blocks, module_infos):
    throughputs = compute_per_block_throughputs(module_infos)
    # Find contiguous range of num_blocks with minimum total throughput
    best_start = argmin(
        sum(throughputs[i:i+num_blocks]) for i in range(total_blocks - num_blocks + 1)
    )
    return list(range(best_start, best_start + num_blocks))

Balance quality metric: $q u a l i t y = \frac{\min_{i} t h r o u g h p u t (i)}{{mean}_{i} t h r o u g h p u t (i)}$

If quality < balance_quality threshold (default 0.75), servers should consider rebalancing.

Related Pages

Implemented By

Implementation:Bigscience_workshop_Petals_Choose_Best_Blocks

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment